Retrieving Structured Items via Utility Estimation
Searching for items by their attribute values or metadata is a commonplace task in e-commerce and science today, for instance, when searching for a product by its technical specifications. Finding a desirable item in such a catalog requires that the user specify desirable properties, specifically desirable attribute values. Current search tools support a retrieval style similar to a database, requiring users to place hard constraints on acceptable attribute values to limit the result set, as in Boolean or faceted search. Boolean retrieval often yields no results or too many. Faceted search usually avoids empty result sets, but the facets are often pre-computed and may not match the user's intent well.
In contrast, modern information retrieval systems have largely abandoned constraint-based retrieval models for those that estimate relevance to the latent user need. Such systems can avoid the problems of constraint-based search, such as empty results sets, by instead ranking by estimated relevance. They also shift the user's mental model from how to retrieve desired results, to simply what results are desired. Such information retrieval techniques have been successfully applied to a wide range of retrieval problems, but not to item retrieval, particularly given numeric attribute values.
This dissertation develops a model of relevance for item retrieval based in part from concepts in multi-attribute decision making theory. We cast the problem as that of utility estimation, and in contrast to the Boolean and faceted approaches, our approach does not use constraints. First, we develop a core model based on multi-attribute utility theory that trades off among conflicting criteria on the user's behalf, and in this way get closer to the underlying query intent. Second, we develop a flexible model of subutility for numeric attributes, using a Bayesian graphical model to learn the specific subutility functions. Finally, we expand our subutility model to handle other types of attributes and to interpret vague natural language queries. We evaluate our model on several item recommender and retrieval datasets, as well as two user studies, and compare its performance to the de facto standard of Boolean retrieval as well as several models proposed in the literature.