This dissertation addresses important problems in decision theory and
data mining. In particular, we focus on problems of the form: Each of
several information sources provides evaluations or measurements of
the objects in a universal set, and the objective is to aggregate
these, possibly conflicting, evaluations into a consensus
evaluation of each object in the universal set. In addition, we
concentrate on the scenario where each source provides evaluations of
only a strict subset of the objects; that is, each source provides an
incomplete evaluation.
In order to define the consensus evaluation from a given set of
incomplete evaluations, two distances are developed: the first is a
distance between incomplete rankings (ordinal evaluations) and the
second is a distance between incomplete ratings (cardinal
evaluations). These two distances generalize Kemeny and Snell’s distance between complete rankings and Cook and Kress’ distance between complete ratings, respectively. Specifically, we introduce a set of natural axioms that must be satisfied by a distance between two incomplete rankings (ratings) and prove the uniqueness and existence of a distance satisfying such axioms. Given a set of incomplete rankings (ratings), the consensus ranking (rating) is
defined as the complete ranking (rating) that minimizes the sum of
distances to each of the given rankings (ratings). We provide several
examples that show that the consensus ranking (rating) obtained by
this approach is more intuitive than that obtained by other approaches.
Finding the consensus ranking is NP-hard, thus we develop two
optimization methodologies to find the consensus ranking: one
efficient approximation algorithm based on the separation-deviation
model and one exact algorithm based on the implicit hitting set
approach. In addition, we show that the optimization problem that
needs to be solved in order to find the consensus rating is a special
case of the separation-deviation model (hereafter SD model), which is
solvable in polynomial time. In this sense, the herein developed
theory (described in the previous paragraph) can be thought of an
axiomatization of the SD model.
Three applications of the SD model are presented: rating the
credit-risk of countries; customer segmentation; and ranking the
participants in a student paper competition. In the credit-risk
rating study, it is shown that the SD model leads to an improved
aggregate rating with respect to several criteria. We compare the SD
model with other aggregation methods and show the following: Although
the SD model is a method to aggregate cardinal evaluations, the
aggregate credit-risk ratings obtained by the SD model are also good
with respect to “ordinal criteria”. Several properties of the SD model are proven, including the property that the aggregate rating
obtained by the SD model agrees with the majority of agencies or
reviewers, regardless of the scale used.
The customer segmentation study shows how to use the SD model to
process data on customer purchasing timing. The outcome of the SD
model provides insights on the rate of new product adoption by the
company’s consumers. In particular, the SD model is used as follows: given the purchase dates for each customer of several products, this information is aggregated in order to rate the customers with regard
to their promptness to adopt new technology. We show that this
approach outperforms unidimensional scaling—a widely used data mining methodology. We analyze the results with respect to various
dimensions of the customer base and report on the generated insights.
The last presented application illustrates our aggregation methods in
the context of the 2007 MSOM’s student paper competition. The
aggregation problem in this competition poses two challenges. First,
each paper was reviewed only by a very small fraction of the judges;
thus the aggregate evaluation is highly sensitive to the subjective
scales chosen by the judges. Second, the judges provided both
cardinal and ordinal evaluations (ratings and rankings) of the papers
they reviewed. This chapter develops the first known methodology to
simultaneously aggregate ordinal and cardinal evaluations into a
consensus evaluation.
Although the content of this dissertation is framed in terms of
decision theory, Hochbaum showed that data mining problems can be
viewed as special cases of decision theory problems. In particular,
the customer segmentation study is a classic data mining problem.