Probabilistic Bag-of-Hyperlinks Models for Entity Linking

The goal of entity linking is to map spans of text to canonical entity representations such as Freebase entries or Wikipedia articles. It provides a foundation for various natural language processing tasks, including text understanding, summarization and machine translation. Name ambiguity, word polysemy, context dependencies, and a heavy-tailed distribution of entities contribute to the complexity of this problem. We here propose a probabilistic approach that makes use of an effective graphical model for collective entity linking, which resolves entity links jointly across an entire document. Our model captures local information from linkable token spans (i.e., mentions) and their surrounding context and combines it with a document-level prior of entity co-occurrences. The model is acquired automatically from entity-linked text repositories with a lightweight computational step for parameter adaptation. Loopy belief propagation is then used as an efficient approximate inference algorithm. Our method does not require extensive feature engineering but relies on simple sufficient statistics extracted from data, thus making it sufficiently fast for real-time usage. We demonstrate its performance on a wide range of well-known entity linking benchmark datasets, demonstrating that our approach matches, and in many cases outperforms, existing state-of-the-art methods.

This paper has been accepted for presentation at the World Wide Web Conference (WWW) 2016 in Montreal, Canada.

Probabilistic Local Expert Retrieval

This paper proposes a range of probabilistic models of local expertise based on geo-tagged social network streams. We assume that frequent visits result in greater familiarity with the location in question. To capture this notion, we rely on spatio-temporal information from users’ online check-in profiles. We evaluate the proposed models on a large-scale sample of geo-tagged and manually annotated Twitter streams. Our experiments show that the proposed methods outperform both intuitive baselines as well as established models such as the Iterative Inference scheme.

This paper has been accepted for presentation at ECIR 2016.