GamifIR 2015 Keynote –
Human Intelligence in Search and Retrieval

Crowdsourcing has developed to become a magic bullet for the data and annotation needs of modern day IR researchers. The number of academic studies as well as industrial applications that employ the crowd for creating, curating, annotating or aggregating documents is growing steadily. Aside from the multitude of scientific papers relying on crowd labour for system evaluation, there has been a strong interdisciplinary line of work dedicated to finding effective and efficient forms of using this emerging labour market. Central research questions include (1) Estimating and optimizing the reliability and accuracy of often untrained workers in comparison with highly trained professionals; (2) How to identify or prevent noise and spam in the submissions; and (3) How to most cost-efficiently distribute tasks and remunerations across workers. The vast majority of studies understands crowdsourcing as the act of making micro payments to individuals in return for compartmentalized units of creative or intelligent labour.

Gamification proposes an alternative incentive model in which entertainment replaces money as the motivating force drawing the workers. Under this alternative paradigm, tasks are embedded in game environments in order to increase the attractiveness and immersion of the work interface. While gamification rightfully points out that paid crowdsourcing is not the only viable option for harnessing crowd labour, it is still merely another concrete instantiation of the community’s actual need: A formal worker incentive model for crowdsourcing. Only by understanding individual motivations can we deliver truly adequate reward schemes that ensure faithful contributions and long-term worker engagement. It is unreasonable to assume that the binary money vs. entertainment decision reflects the full complexity of the worker motivation spectrum. What about education, socializing, vanity, or charity? All of these are valid examples of factors that compel people to lend us their work force. This is not to say that we necessarily have to promote edufication and all its possible siblings as new paradigms, they should merely start to take their well deserved space on our mental map of crowdsourcing incentives.

In this talk, we will cover a range of interesting scenarios in which different incentive models may fundamentally change the way in which we can tap the considerable potential of crowd labour. We will discuss cases in which standard crowdsourcing and gamifica-tion schemes reach the limits of their capabilities, forcing us to rely on alternative strategies. Finally, we will investigate whether crowdsourcing indeed even has to be an active occupation or whether it can happen as a by-product of more organic human behaviour.

If you are interested in the full talk, please join us for the GamifIR Workshop at ECIR 2015 in Vienna, Austria.

CIKM 2014, Shanghai, China

These are my personal (biased by interest and jet lag) highlights of the oral paper presentations:

  • Tetsuya Sakai  Designing Test Collections for Comparing many Systems. The author presents a statistical analysis of the question how many topics (e.g., queries) an experimental collection should comprise in order to give a reliable performance estimate for a fixed number of participating systems/methods as well as a required target confidence level. The insights of their research are made available in the form of conveniently applicable spreadsheets.
  • Anne Schuth et al.  Multileaved Comparisons for Fast Online Evaluation. The authors introduce a modification to the established team-draft and optimized interleafing methods for online evaluation. While previous evaluation schemes allowed only for direct evaluation of 2 systems at a time, the presented modification efficiently yields direct comparisons for arbitrary numbers of systems in a performance preserving manner.
  • Nikita Spirin et al.  Large Scale Analysis of People Search in an Online Social Network. The authors present a broad analysis of how the community makes use of the a social networking platform’s graph-based people search feature. The study is based on the proprietary log files of Facebook.
  • Julia Kiseleva et al.  Modelling and Detecting Changes in User Satisfaction. The authors discuss the phenomenon of “concept drift” on dynamic Web Search Engines. SERPs to popular queries sometimes experience drastic changes in quality when late-breaking news or previously unseen facets of the overall topic are not included due to negative reinforcement by the user community (the filter bubble). The authors show how this effect can be detected and how appropriate diversification measures can help. This study is based on the proprietary log files of Microsoft Bing.
  • Philip McParlane et al.  Picture the scene…, Visually Summarizing Social Media Events. The authors present a topicality and diversity aware summarization scheme for creating comprehensive visual digests of social media collections. An especially interesting aspect to this paper is the detailed discussion of the various data cleaning efforts necessary to embark on this task.

Contextual Multidimensional Relevance Models

Last week, I defended my PhD thesis entitled “Contextual Multidimensional Relevance Models” at TU Delft in The Netherlands. This is, in brief, what awaits the reader:

Information retrieval systems centrally build upon the concept of relevance in order to rank documents in response to a user’s query. Assessing relevance is a non-trivial operation that can be influenced by a multitude of factors that go beyond mere topical overlap with the query. This thesis argues that relevance depends on personal (Chapter 2) and situational (Chapter 3) context. In many use cases, there is no single interpretation of the concept that would optimally satisfy all users in all possible situations.

We postulate that relevance should be explicitly modelled as a composite notion comprised of individual relevance dimensions. To this end, we show how automatic inference schemes based on document content and user activity can be used in order to estimate such constituents of relevance (Chapter 4). Alternatively, we can employ human expertise, harnessed, for example, via commercial crowdsourcing or serious games to judge the degree to which a document satisfies a given set of relevance dimensions (Chapter 5).

Finally, we need a model that allows us to estimate the joint distribution of relevance across all previously obtained dimensions. In this thesis, we propose using copulas, a model family originating from the field of quantitative finances that decouples observations and dependency structure and which can account for complex non-linear dependencies among relevance dimensions (Chapter 6).

Contextual Multidimensional Relevance Models

Get your copy here.

Modelling Complex Relevance Spaces with Copulas

Modern relevance models consider a wide range of criteria in order to identify those documents that are expected to satisfy the user’s information need. With growing dimensionality of the underlying relevance spaces the need for sophisticated score combination and estimation schemes arises. In this paper, we investigate the use of copulas, a model family from the domain of robust statistics, for the formal estimation of the probability of relevance in high-dimensional spaces. Our experiments are based on the MSLR-WEB10K and WEB30K datasets, two annotated, publicly available samples of hundreds of thousands of real Web search impressions, and suggest that copulas can significantly outperform linear combination models for high-dimensional problems. Our models achieve a performance on par with that of state-of-the-art machine learning approaches.


This paper was accepted for presentation at the 23rd ACM Conference on Information and Knowledge Management (CIKM) in Shanghai, China.

Interactive Summarization of Social Media Streams

Data visualization and exploration tools are crucial for data scientists, especially during pilot studies. In this paper, we present an extensible open-source workbench for aggregating, summarizing and filtering social network profiles derived from tweets. We demonstrate its range of basic features for two use cases: geo-spatial profile summarization based on check-in histories and social media based complaint discovery in water management.


This paper was accepted as a demonstrator at the 6th Information & Interaction in Context Conference in Regensburg, Germany.

ECIR 2014 in Amsterdam, The Netherlands

Some highlights from the oral paper presentations:

Crowd-Powered Experts

Crowdsourcing is often applied for the task of replacing the scarce or expensive labour of experts with that of untrained workers. In this paper, we argue, that this objective might not always be desirable, but that we should instead aim at leveraging the considerable work force of the crowd in order to support the highly trained expert. Here, we demonstrate this different paradigm on the example of detecting malignant breast cancer in medical images. We compare the effectiveness and efficiency of experts to that of crowd workers, finding significantly better performance at greater cost. In a second series of experiments, we show how the comparably cheap results produced by crowdsourcing workers can serve to make experts more efficient AND more effective at the same time.


The full version of this article has been accepted for presentation at the ECIR 2014 Workshop on Gamification for Information Retrieval (GamifIR) in Amsterdam, The Netherlands.

WSDM 2014, New York City

Some highlights from the oral paper presentations:

  • Guillem Francès et al.Improving the Efficiency of Multi-site Web Search Engines. The authors investigate query forwarding and document replication strategies for multi-site search engines in order to increase system efficiency.
  • Ahmed Hassan et al.Struggling or Exploring? Disambiguating Long Search Sessions. Long search sessions are typically explained by either high degrees of user involvement or frustration. The authors contrast several key characteristics of exploring vs. struggling searcher behaviour.
  • Xin Li et al.Search Engine Click Spam Detection Based on Bipartite Graph Propagation. Based on a bootstrapping mechanism, the authors expand a seed set of click-spam examples to create a reliable graph-based spam detection scheme.
  • Dmitry Lagun et al.Discovering Common Motifs in Cursor Movement Data for Improving Web Search. (Best student paper) Mouse cursor movement has been previously shown to be correlated with eye gaze traces, effectively forming a proxy for users’ attention and focus. In this paper, the authors use signals mined from cursor movements for search result personalization.
  • Youngho Kim et al.Modeling Dwell Time to Predict Click-level Satisfaction. Search engine providers traditionally interpret long dwell times above a fixed threshold duration (often 30 seconds) as a sign of user satisfaction and engagement. In this paper, the authors revisit this practice and report an interesting array of topic and user dependent influences on the concrete settings of such thresholds.
  • Hongning Wang et al.User Modeling in Search Logs via a Nonparametric Bayesian Approach. The authors model query intents in the form of latent topics in a joint distribution of queries and behaviour (clicks). Every concrete user session is represented as a mixture of such topics.
  • Andrei Broder et al.Scalable K-Means by Ranked Retrieval. Many information retrieval applications rely on clustering methods such as the popular k-means algorithm. In its most straight-forward implementation, this method requires expensive index reconstructions as cluster centroids shift. In this paper, the authors demonstrate speed-ups of up to 2 orders of magnitude by issuing spatial queries to remove the need for index reconstruction.
  • Yuzhe Jin et al.Entity Linking at the Tail: Sparse Signals, Unknown Entities, and Phrase Models. Entity linking, the task of connecting spans of text to concrete entities in an ontology, often suffers performance issues related to previously unobserved entities. The authors regard the set of all observed entities in the ontology as being sampled from the global pool of all entities. This method allows for elegant ways of dealing with previously unseen entities.

Lessons from the Journey: A Query Log Analysis of Within-Session Learning

The Internet is the largest source of information in the world. Search engines help people navigate the huge space of available data in order to acquire new skills and knowledge. In this paper, we present an in-depth analysis of sessions in which people explicitly search for new knowledge on the Web based on the log files of a popular search engine. We investigate within-session and cross-session developments of expertise, focusing on how the language and search behavior of a user on a topic evolves over time. In this way, we identify those sessions and page visits that appear to significantly boost the learning process. Our experiments demonstrate a strong connection between clicks and several metrics related to expertise. Based on models of the user and their specific context, we present a method capable of automatically predicting, with good accuracy, which clicks will lead to enhanced learning. Our findings provide insight into how search engines might better help users learn as they search.

WSDM 2014 poster

This work has been accepted for publication at the 7th ACM Conference on Web Search and Data Mining (WSDM) in New York City.

SIGIR 2013, Dublin, Ireland

Some highlights from the oral paper presentations: