Information explosion and the increasing use of Internet have fueled the growing popularity of personalization systems. Such systems can understand user interests to customize the information served to them, thereby addressing information overload. At the same time, personalization systems can also enable service and content providers to serve targeted advertisements and recommendations to users. In order to operate on the massive scales of the number of users and the amount of data available, personalization systems have a crucial requirement for automation and efficiency. In this dissertation, we identify key challenges faced by personalization systems and provide solutions to address them such that the above requirements can be achieved.
We first note that content classification is an important component of personalization systems. Approaches to train classification models typically depend on manual collection and labeling of training data, which makes personalization non-scalable. To address this, we develop a completely automated framework that can provide labeled training data for arbitrary set of categories. Experiments using online videos demonstrate the feasibility and effectiveness of our approach.
The second key challenge we address is the sparsity in annotations of popular online content such as Flickr images. Sparse or missing tags hamper the ability of personalization systems to recommend content or to infer the interests of users that access them. Towards this, we show how ontological tag trees can be constructed from corpus based statistics and semantic relationships between tags, to alleviate tag sparsity in a space efficient manner. Through evaluations, we demonstrate the effectiveness and efficiency of ontological tag trees as compared to existing methods.
Lastly, we focus on alleviating information overload in enterprise repositories. We design a file metadata based recommendation system that captures per user access patterns and user collaboration to recommend new files. In order to address scalability concerns of per user modeling, we propose optimizations that significantly reduce the time to serve recommendations to users. Experiments over actual enterprise data show that more than two orders of speed up is obtained as a result of the proposed methods.