Category Archives: Conferences

Probabilistic Local Expert Retrieval

This paper proposes a range of probabilistic models of local expertise based on geo-tagged social network streams. We assume that frequent visits result in greater familiarity with the location in question. To capture this notion, we rely on spatio-temporal information from users’ online check-in profiles. We evaluate the proposed models on a large-scale sample of geo-tagged and manually annotated Twitter streams. Our experiments show that the proposed methods outperform both intuitive baselines as well as established models such as the Iterative Inference scheme.

This paper has been accepted for presentation at ECIR 2016.

CIKM 2015 – Melbourne, Australia

My personal highlights among the oral paper presentations:

  • Chia-Jung Lee et al.An Optimization Framework for Merging Multiple Result Lists. The authors present a neural network-based approach to learning optimal result list fusion parameters for federated search.
  • David Maxwell et al.Searching and Stopping: An Analysis of Stopping Rules and Strategies. The authors investigate different models of search session termination, aiming to determine the point at which the user stops scanning the result list. To this end, they rely on behavioral theories of frustration and disgust.
  • Alessandro Sordoni et al.A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion. This paper describes the use of hierarchical de-/encoders for query suggestion. Generating a word at a time, the method aims at suggesting contextualised query candidates while ensuring robustness to candidate frequency, making it an interesting option for tail information needs.
  • Tom Kenter et al.Ad Hoc Monitoring of Vocabulary Shifts over Time. The authors describe a distributional semantics approach to characterizing transient word meanings over time. Relying on semantics-preserving word embeddings, they are able to track changing term interpretations as well as changing terminology for the same concept as language and society evolve.
  • Daan Odijk et al.Struggling and Success in Web Search. (Best Student Paper) The paper describes a large-scale empirical study of search success as well as struggles in finding the desired content. The experiment leads to the development of a number of practical techniques for forecasting future user actions, ultimately allowing to support those users with systematic search strategy deficiencies.

SIGIR 2015, Santiago, Chile

My personal highlights among the oral paper presentations:

  • Bhaskar MitraExploring Session Context using Distributed Representations of Queries and Reformulations. The authors rely on convolutional neural networks in order to learn semantically similar query reformulation patterns. Each observed reformulation from the log is mapped into the vector space in order to group and forecast reformulations and, subsequently, improve query auto completion accuracy.
  • Christina Lioma, Jakob Grue Simonsen, Birger Larsen and Niels Dalum HansenNon-Compositional Term Dependence for Information Retrieval. The authors tackle the challenge of estimating term dependencies by means of Markov random fields based on the notion of term compositionality, following the intuition that non-compositional terms show maximal dependence. In this way, they present an alternative to the popular co-occurrence based dependency estimation schemes.
  • Diane Kelly and Leif AzzopardiHow many Results per Page? A Study of SERP Size, Search Behavior and User Experience. This paper studies the relationships among the number of results shown on a SERP, search behavior and user experience. The authors instrument the SERP, showing three, six or the standard ten organic links per page, investigating user experience as well as cognitive and physical workload.
  • Artem Grotov, Shimon Whiteson and Maarten de RijkeBayesian Ranker Comparison based on Historical User Interactions. Instead of relying live comparison of production and candidate rankers, e.g., in an interleaving fashion, the authors propose a Bayesian scheme for estimating performance metrics and confidence levels on the basis of historic interactions. In this way, risky in vivo experiments can be avoided.

Exploiting Document Content for Efficient Aggregation of Crowdsourcing Votes

The use of crowdsourcing for document relevance assessment has been found to be a viable alternative to corpus annotation by highly trained experts. The question of quality control is a recurring challenge that is often addressed by aggregating multiple individual assessments of the same topic-document pair from independent workers. In the past, such aggregation schemes have been weighted or filtered by estimates of worker reliability based on a multitude of behavioral features. We propose an alternative approach by relying on document information. Inspired by the clustering hypothesis, we assume textually similar documents to show similar degrees of relevance towards a given topic. Following up on this intuition, we propagate crowd-generated relevance judgments to similar documents, effectively smoothing the distribution of relevance labels across the similarity space.

Our experiments are based on TREC Crowdsourcing Track data and show that even simple aggregation methods utilizing document similarity information significantly improve over majority voting in terms of accuracy as well as cost efficiency. Combining methods for both aggregation and active learning based on document information improves the results even further.

This paper has been accepted for presentation at the 24th ACM Conference on Information and Knowledge Management (CIKM) in Melbourne, Australia.

An Eye-Tracking Study of Query Reformulation

Information about a user’s domain knowledge and interest can be important signals for many information retrieval tasks such as query suggestion or result ranking. State-of-the-art user models rely on coarse-grained representations of the user’s previous knowledge about a topic or domain. We study query refinement using eye-tracking in order to gain precise and detailed insight into which terms the user was exposed to in a search session and which ones they showed a particular interest in. We measure fixations on the term level, allowing for a detailed model of user attention. To allow for a wide-spread exploitation of our findings, we generalize from the restrictive eye-gaze tracking to using more accessible signals: mouse cursor traces. Based on the public API of a popular search engine, we demonstrate how query suggestion candidates can be ranked according to traces of user attention and interest, resulting in significantly better performance than achieved by an attention-oblivious industry solution. Our experiments suggest that modelling term-level user attention can be achieved with great reliability and holds significant potential for supporting a range of traditional IR tasks.

The full version of this work has been accepted for presentation at the 38th Annual ACM SIGIR Conference in Santiago, Chile.


Modelling Term Dependence with Copulas

Many generative language and relevance models assume conditional independence between the likelihood of observing individual terms. This assumption is obviously naive, but also hard to replace or relax. There are only very few term pairs that actually show significant conditional dependencies while the vast majority of co-located terms has no implications on the document’s topical nature or relevance towards a given topic. It is exactly this situation that we capture in a formal framework: A limited number of meaningful dependencies in a system of largely independent observations. Making use of the formal copula framework, we describe the strength of causal dependency in terms of a number of established term co-occurrence metrics. Our experiments based on the well known ClueWeb’12 corpus and TREC 2013 topics indicate significant performance gains in terms of retrieval performance when we formally account for the dependency structure underlying pieces of natural language text.

The full version of this work has been accepted for presentation at the 38th Annual ACM SIGIR Conference in Santiago, Chile.

ECIR 2015 in Vienna, Austria

These are my personal highlights of the oral paper presentations:

  • Morgan Harvey and Fabio Crestani Long Time, No Tweets! Time-aware Personalised Hashtag Suggestion. The authors recommend hashtag candidates for tweets in order to increase retrieveability and organization of content in a microblogging environment. In particular, their method is based on temporal distribution patterns of tags observed in the training data.
  • Matthias Hagen et al.  A Corpus of Realistic Known-Item Topics with Associated Web Pages in the ClueWeb09. The authors present a collection of textual documents relating to the task of known item retrieval. Their selection was created sampling questions from Yahoo Answers that were satisfied by resources in the ClueWeb’09 Web page corpus. As an aside, the authors annotate cases of false memories in which users’ original requests are misleading and needed substantial reformulation aid from the Q&A community.
  • Grace Yang et al.  Designing States, Actions, and Rewards for Using POMDP in Session Search. The authors present a model of user behaviour in search sessions based on reinforcement learning. In particular, they rely on Partially Observable Markov Decision Processes to capture the relevant components of the search process.
  • Horatiu Bota et al.  Exploring Composite Retrieval from the Users’ Perspective. (Best Paper) The authors study the emerging task of composite retrieval in which semantically related results from different content verticals are presented in so-called bundles. Based on an empirical study, they investigate bundle relevance, coherence and diversity.

GamifIR 2015 Keynote –
Human Intelligence in Search and Retrieval

Crowdsourcing has developed to become a magic bullet for the data and annotation needs of modern day IR researchers. The number of academic studies as well as industrial applications that employ the crowd for creating, curating, annotating or aggregating documents is growing steadily. Aside from the multitude of scientific papers relying on crowd labour for system evaluation, there has been a strong interdisciplinary line of work dedicated to finding effective and efficient forms of using this emerging labour market. Central research questions include (1) Estimating and optimizing the reliability and accuracy of often untrained workers in comparison with highly trained professionals; (2) How to identify or prevent noise and spam in the submissions; and (3) How to most cost-efficiently distribute tasks and remunerations across workers. The vast majority of studies understands crowdsourcing as the act of making micro payments to individuals in return for compartmentalized units of creative or intelligent labour.

Gamification proposes an alternative incentive model in which entertainment replaces money as the motivating force drawing the workers. Under this alternative paradigm, tasks are embedded in game environments in order to increase the attractiveness and immersion of the work interface. While gamification rightfully points out that paid crowdsourcing is not the only viable option for harnessing crowd labour, it is still merely another concrete instantiation of the community’s actual need: A formal worker incentive model for crowdsourcing. Only by understanding individual motivations can we deliver truly adequate reward schemes that ensure faithful contributions and long-term worker engagement. It is unreasonable to assume that the binary money vs. entertainment decision reflects the full complexity of the worker motivation spectrum. What about education, socializing, vanity, or charity? All of these are valid examples of factors that compel people to lend us their work force. This is not to say that we necessarily have to promote edufication and all its possible siblings as new paradigms, they should merely start to take their well deserved space on our mental map of crowdsourcing incentives.

In this talk, we will cover a range of interesting scenarios in which different incentive models may fundamentally change the way in which we can tap the considerable potential of crowd labour. We will discuss cases in which standard crowdsourcing and gamifica-tion schemes reach the limits of their capabilities, forcing us to rely on alternative strategies. Finally, we will investigate whether crowdsourcing indeed even has to be an active occupation or whether it can happen as a by-product of more organic human behaviour.

If you are interested in the full talk, please join us for the GamifIR Workshop at ECIR 2015 in Vienna, Austria.

CIKM 2014, Shanghai, China

These are my personal (biased by interest and jet lag) highlights of the oral paper presentations:

  • Tetsuya Sakai  Designing Test Collections for Comparing many Systems. The author presents a statistical analysis of the question how many topics (e.g., queries) an experimental collection should comprise in order to give a reliable performance estimate for a fixed number of participating systems/methods as well as a required target confidence level. The insights of their research are made available in the form of conveniently applicable spreadsheets.
  • Anne Schuth et al.  Multileaved Comparisons for Fast Online Evaluation. The authors introduce a modification to the established team-draft and optimized interleafing methods for online evaluation. While previous evaluation schemes allowed only for direct evaluation of 2 systems at a time, the presented modification efficiently yields direct comparisons for arbitrary numbers of systems in a performance preserving manner.
  • Nikita Spirin et al.  Large Scale Analysis of People Search in an Online Social Network. The authors present a broad analysis of how the community makes use of the a social networking platform’s graph-based people search feature. The study is based on the proprietary log files of Facebook.
  • Julia Kiseleva et al.  Modelling and Detecting Changes in User Satisfaction. The authors discuss the phenomenon of “concept drift” on dynamic Web Search Engines. SERPs to popular queries sometimes experience drastic changes in quality when late-breaking news or previously unseen facets of the overall topic are not included due to negative reinforcement by the user community (the filter bubble). The authors show how this effect can be detected and how appropriate diversification measures can help. This study is based on the proprietary log files of Microsoft Bing.
  • Philip McParlane et al.  Picture the scene…, Visually Summarizing Social Media Events. The authors present a topicality and diversity aware summarization scheme for creating comprehensive visual digests of social media collections. An especially interesting aspect to this paper is the detailed discussion of the various data cleaning efforts necessary to embark on this task.