Ranking and Feedback-based Stopping for Recall-centric Document Retrieval

Medical systematic reviews require researchers to identify the entire body of relevant literature. Algorithms that filter the list for manual scanning with nearly perfect recall can significantly decrease the workload. This paper presents a novel stopping criterion that estimates the score-distribution of relevant articles from relevance feedback of random articles (S-D Minimal Sampling). Using 20 training and 30 test topics, we achieve a mean recall of 93.3%, filtering out 59.1% of the articles. This approach achieves higher F2-Scores at significantly reduced manual reviewing work loads. The method is especially suited for scenarios with sufficiently many relevant articles (> 5) that can be sampled and employed for relevance feedback.

To appear in Proceedings of the International Conference of the Cross-Language Evaluation Forum for European Languages (CLEF), Dublin, Ireland, 2017