Crowd-Powered Experts

Crowdsourcing is often applied for the task of replacing the scarce or expensive labour of experts with that of untrained workers. In this paper, we argue, that this objective might not always be desirable, but that we should instead aim at leveraging the considerable work force of the crowd in order to support the highly trained expert. Here, we demonstrate this different paradigm on the example of detecting malignant breast cancer in medical images. We compare the effectiveness and efficiency of experts to that of crowd workers, finding significantly better performance at greater cost. In a second series of experiments, we show how the comparably cheap results produced by crowdsourcing workers can serve to make experts more efficient AND more effective at the same time.


The full version of this article has been accepted for presentation at the ECIR 2014 Workshop on Gamification for Information Retrieval (GamifIR) in Amsterdam, The Netherlands.

WSDM 2014, New York City

Some highlights from the oral paper presentations:

  • Guillem Francès et al.Improving the Efficiency of Multi-site Web Search Engines. The authors investigate query forwarding and document replication strategies for multi-site search engines in order to increase system efficiency.
  • Ahmed Hassan et al.Struggling or Exploring? Disambiguating Long Search Sessions. Long search sessions are typically explained by either high degrees of user involvement or frustration. The authors contrast several key characteristics of exploring vs. struggling searcher behaviour.
  • Xin Li et al.Search Engine Click Spam Detection Based on Bipartite Graph Propagation. Based on a bootstrapping mechanism, the authors expand a seed set of click-spam examples to create a reliable graph-based spam detection scheme.
  • Dmitry Lagun et al.Discovering Common Motifs in Cursor Movement Data for Improving Web Search. (Best student paper) Mouse cursor movement has been previously shown to be correlated with eye gaze traces, effectively forming a proxy for users’ attention and focus. In this paper, the authors use signals mined from cursor movements for search result personalization.
  • Youngho Kim et al.Modeling Dwell Time to Predict Click-level Satisfaction. Search engine providers traditionally interpret long dwell times above a fixed threshold duration (often 30 seconds) as a sign of user satisfaction and engagement. In this paper, the authors revisit this practice and report an interesting array of topic and user dependent influences on the concrete settings of such thresholds.
  • Hongning Wang et al.User Modeling in Search Logs via a Nonparametric Bayesian Approach. The authors model query intents in the form of latent topics in a joint distribution of queries and behaviour (clicks). Every concrete user session is represented as a mixture of such topics.
  • Andrei Broder et al.Scalable K-Means by Ranked Retrieval. Many information retrieval applications rely on clustering methods such as the popular k-means algorithm. In its most straight-forward implementation, this method requires expensive index reconstructions as cluster centroids shift. In this paper, the authors demonstrate speed-ups of up to 2 orders of magnitude by issuing spatial queries to remove the need for index reconstruction.
  • Yuzhe Jin et al.Entity Linking at the Tail: Sparse Signals, Unknown Entities, and Phrase Models. Entity linking, the task of connecting spans of text to concrete entities in an ontology, often suffers performance issues related to previously unobserved entities. The authors regard the set of all observed entities in the ontology as being sampled from the global pool of all entities. This method allows for elegant ways of dealing with previously unseen entities.