In this work, we reflect on ways to improve medical information retrieval accuracy by drawing implicit negative feedback from negated information in noisy natural language search queries. We begin by studying the extent to which negations occur in clinical texts and quantify their detrimental effect on retrieval performance. Subsequently, we present approaches to query reformulation and ranking that remedy these shortcomings by resolving natural language negations. Our experimental results are based on data collected in the course of the TREC Clinical Decision Support Track and show consistent improvements compared to state-of-the-art methods. Using one of our novel algorithms, we are able to alleviate the negative impact of negations on early precision.
This paper has been accepted for presentation at the ACM SIGIR Medical Information Retrieval Workshop (MedIR) in Pisa, Italy.
The increased popularity and ubiquitous availability of online social networks and globalised Internet access have affected the way in which people share content. The information that users willingly share in these platforms can be used for various purposes, from building consumer models for advertising, to inferring personal, potentially invasive, information.
In this work, we use Twitter, Instagram and Foursquare data to convey the idea that the content shared by users, especially when aggregated across platforms, can potentially disclose more information than was originally intended.
We perform two case studies: First, we perform user de-anonymization by mimicking the scenario of finding the identity of a user making anonymous posts within a group of users. Empirical evaluation on a sample of real-world social network profiles suggests that cross platform aggregation introduces significant performance gains in user identification.
In the second task, we show that it is possible to infer physical location visits of a user on the basis of shared Twitter and Instagram content. We present an informativeness scoring function which estimates the relevance and novelty of a shared piece of information with respect to an inference task. This measure is validated using an active learning framework which chooses the most informative content at each given point in time. Based on a large-scale data sample, we show that by doing this, we can attain an improved inference performance. In some cases this performance exceeds even the use of the user’s full timeline.
This paper has been accepted for presentation at the ACM SIGIR Workshop on Privacy-Preserving Information Retrieval (PIR) in Pisa, Italy.
Following constructivist models of contextual learning, knowledge acquisition goes beyond mere absorption of isolated facts, and, instead is enabled, stimulated and supported by related existing knowledge and experiences. We discuss a range of query expansion and result list re-ranking techniques aiming to preserve contextual dependencies among retrieved documents and, thereby, enhancing the performance of learning-centric search engines. Our empirical evaluation is based on a snapshot of Wikipedia and suggests significantly increased usability during an interactive user study.
This paper has been accepted for presentation at the ACM SIGIR Search as Learning Workshop (SAL) in Pisa, Italy.