UC Santa Barbara
LiRa: A New Likelihood-Based Similarity Score for Collaborative Filtering
- Author(s): Strnadova-Neeley, Veronika
- Buluc, Aydin
- Gilbert, John R
- Oliker, Leonid
- Ouyang, Weimin
- et al.
Recommender system data presents unique challenges to the data mining, machine learning, and algorithms communities. The high missing data rate, in combination with the large scale and high dimensionality that is typical of recommender systems data, requires new tools and methods for efficient data analysis. Here, we address the challenge of evaluating similarity between two users in a recommender system, where for each user only a small set of ratings is available. We present a new similarity score, that we call LiRa, based on a statistical model of user similarity, for large-scale, discrete valued data with many missing values. We show that this score, based on a ratio of likelihoods, is more effective at identifying similar users than traditional similarity scores in user-based collaborative filtering, such as the Pearson correlation coefficient. We argue that our approach has significant potential to improve both accuracy and scalability in collaborative filtering.