Modern relevance models consider a wide range of criteria in order to identify those documents that are expected to satisfy the user’s information need. With growing dimensionality of the underlying relevance spaces the need for sophisticated score combination and estimation schemes arises. In this paper, we investigate the use of copulas, a model family from the domain of robust statistics, for the formal estimation of the probability of relevance in high-dimensional spaces. Our experiments are based on the MSLR-WEB10K and WEB30K datasets, two annotated, publicly available samples of hundreds of thousands of real Web search impressions, and suggest that copulas can significantly outperform linear combination models for high-dimensional problems. Our models achieve a performance on par with that of state-of-the-art machine learning approaches.
This paper was accepted for presentation at the 23rd ACM Conference on Information and Knowledge Management (CIKM) in Shanghai, China.