This month’s Language and Speech Colloquium at Radboud University Nijmegen targeted “Internet Data for L&S research”
Carsten Eickhoff – An Introduction to Crowdsourcing
Crowdsourcing is a topic of growing interest for many fields of research and industry. It facilitates outsourcing short and simple tasks which require human cognitive abilities to a large pool of potential workers. Traditional examples of such human intelligence tasks (HITs) are image tagging, text annotation or the creation of quality judgements.
A commonly observed problem in crowdsourcing environments is the presence of workers who strive to maximize their financial efficiency by completing as many HITs as possible in a given period of time. This approach leads to very low overall result quality and often rends the HIT useless for the requester. In most cases it is easy to identify workers who repeatedly produce quick low quality results to increase their financial gain. However, this identification step requires additional resources such as time or money. In our work we demonstrate how innovative and non-repetitive HIT design can a priori discourage sloppy, gain-driven workers from taking up our HITs without having to invest further resources in identification and rejection.
Claudia Peersman (University of Antwerp) – Age and gender prediction on Netlog posts
In recent years social networking sites and chat applications have redefined how people communicate. Millions of people use social networking sites to support their personal and professional communications by creating digital communities. A common characteristic of communication on these online social networks and in chat rooms is that it happens via very short entries using non-standard language variations, which makes this type of text a challenging text genre for natural language processing.
In this presentation we will discuss some of the main characteristics of our corpus of chat texts, which we collected from the Belgian Netlog social networking site and present the results of our exploratory study in which we apply an automatic text categorization approach for the prediction of age and gender in this Netlog corpus. In spite of the challenging characteristics of this type of data, our results show that it is feasible to reach useful performance for both age and gender prediction.