Probabilistic Bag-of-Hyperlinks Models for Entity Linking

The goal of entity linking is to map spans of text to canonical entity representations such as Freebase entries or Wikipedia articles. It provides a foundation for various natural language processing tasks, including text understanding, summarization and machine translation. Name ambiguity, word polysemy, context dependencies, and a heavy-tailed distribution of entities contribute to the complexity of this problem. We here propose a probabilistic approach that makes use of an effective graphical model for collective entity linking, which resolves entity links jointly across an entire document. Our model captures local information from linkable token spans (i.e., mentions) and their surrounding context and combines it with a document-level prior of entity co-occurrences. The model is acquired automatically from entity-linked text repositories with a lightweight computational step for parameter adaptation. Loopy belief propagation is then used as an efficient approximate inference algorithm. Our method does not require extensive feature engineering but relies on simple sufficient statistics extracted from data, thus making it sufficiently fast for real-time usage. We demonstrate its performance on a wide range of well-known entity linking benchmark datasets, demonstrating that our approach matches, and in many cases outperforms, existing state-of-the-art methods.

This paper has been accepted for presentation at the World Wide Web Conference (WWW) 2016 in Montreal, Canada.