Bitton, Ephrat

Geometric Models for Collaborative Search and Filtering

2011

Abstract

This dissertation explores the use of geometric and graphical models for a variety of information search and filtering applications. These models serve to provide an intuitive understanding of the problem domains and as well as computational efficiencies to our solution approaches.

We begin by considering a search and rescue scenario where both human and automated agents share control over a fleet of unmanned aerial vehicles (UAVs) with the goal of locating a missing subject as quickly as possible. We describe a new interface and search framework, Hydra, which merges the intuition, reasoning, and vision capabilities of humans with the computational power of machines to reduce the expected time to locate the subject. The interface allows participating human agents to collaboratively decide where to send the UAVs via spatial dynamic voting, a geometric method for aggregating regional selections (votes) on a map. Via extensive simulation and theoretical analysis, we show that our method can be an effective component of search and rescue operations.

In the next chapter, we present a new graph-theoretical model for filtering a large set of genes to identify those that exhibit the most significant change in expression values between a series of control and test experiments; this is known as the Gene Selection Problem. Although not a geometric model in the traditional sense, graph theory allows us to organize data in abstract geometric spaces, where similarity metrics are used to define relative distances between nodes of data as opposed to working with an absolute coordinate system. Our algorithm first pre-processes the data using statistical hypothesis testing to filter out statistically irrelevant genes, and then we analyze the expression levels recorded for each gene by modeling them on a graph and evaluating the capacity of the cut between the test and control experiments. The capacity of a cut on a graph is a measure of the separation between two disjoint sets of nodes, and we use this value to rank the genes. We evaluated our model on a rich data set assessing the success of embryo implantation in mice in the presence or absence of uterine dendritic cells. A thorough biological analysis of our results enabled the discovery of significant factors that were not identified by more traditional, statistical methods.

In the remaining chapters of this dissertation, we transition to a series of algorithms and models for filtering information in a collaborative, social context. We begin by presenting a new, constant-time recommender system for jokes that adapts in real-time to changes in user preferences or mood. We also present an extension of this system that makes personalized recommendations on where participants might wish to donate their money.

Chapters 5 and 6 consider the domain of collaborative opinion and idea sharing in an online setting. We present a new tool, Opinion Space, that we are developing for visualizing and crowdsourcing a diversity of insights collected via textual responses to a discussion question. Opinion Space projects participants onto a two-dimensional plane using Principal Component Analysis based on their levels of agreement with a series of statements. The projection is specifically designed so that participants with similar opinions will be near each other in the space; this allows participants to easily navigate the diversity of opinions shared by others.

Over the last two years, we have released multiple versions of Opinion Space and collected several rich data sets for analysis. In Chapter 5 we describe the interface and design decisions made when building the site. We also present results from a controlled user study comparing user engagement with Opinion Space versus more traditional models of online opinion sharing (specifically, linear comment lists). Not only did we find that participants were significantly more engaged with Opinion Space, but they had significantly higher levels of agreement with and respect for the responses that they read.

In Chapter 6 we present several models, both geometric and statistical, for ranking the contributions of our participants based on how insightful they are. Our primary model considers the spatial relationships between users in addition to the ratings they give each other; the intuition behind the model can be described as follows. By giving users the opportunity to rate the responses they read, we allow for the very likely possibility that users will only promote their own interests and rate opposing opinions poorly, even if it is a well-written and pointed response. We claim that this behavior is of little value towards our objective of identifying insightful ideas, because users are simply reinforcing their own opinions. Visually, one can imagine that the space of users is partitioned into subgroups or smaller spheres of agreement, and we are interested in emphasizing the comments where these spheres intersect. In this scenario, we have identified users of different viewpoints that have potentially found a legitimate middle ground.

Chapter 7 provides concluding remarks on our work with Opinion Space from a New Media and social responsibility perspective, and we present preliminary results on future work in the area.

UC Berkeley

Geometric Models for Collaborative Search and Filtering