Personalized recommendation is an important, yet challenging task that benefits both service provides and consumers, by enabling more efficient and informative choices.
However, despite the increased number of input streams, the user-generated data are sparse, since time constrains limit the number of items humans can interact with. Additionally, people make item selections in a variety of contexts, which can significantly affect their consumption behavior. This thesis is focused on exploring such data, and introducing models that can address these issues.
For the dissertation’s first contribution, we focus on data that include consumption of both new and old items. There has been significant prior work on developing predictive modeling techniques for recommending \emph{new} items to individuals, however there are many situations where making predictions for both previously-consumed and new items for an individual is of interest. We develop a mixture model framework that addresses sparsity constrains through empirical Bayesian priors, and can balance individual preferences in terms of exploration and exploitation. We evaluate our model using several real-world datasets, including location, social media, and music listening data.
Next, we investigate the problem of incorporating contextual information when making predictions. We focus on the problem of language modeling, where context (such as the identity of the speaker) can alter language use. We propose a general approach to language modeling that can efficiently and dynamically incorporate multiple types of external contextual information, without increasing the complexity of the model. Experiments on Reddit and Yelp corpora demonstrate that the proposed approach not only results in increased accuracy over competing methods, but also provides useful insights into how different contexts affect language use.
In the final chapter we investigate a more practical matter, and present a method that can generate more detailed affinity data between users and attributes. There are a lot of available data, where users have written a review of an entity and given it a score. However, this score is general, and does not represent how the reviewer feels about each attribute of the entity.
In this chapter we take on the problem of learning classifiers from group labels that are able to make predictions at the instance level.
We evaluate our approach using three large review data sets from IMDB, Amazon and Yelp, and demonstrate that the approach is both accurate and scalable compared to various alternatives.