In a number of domains in computer vision, machine learning and psychology, it is common to model and analyze data in terms of pairwise interactions between data elements, that is, distances between pairs of points. A large variety of supervised and unsupervised algorithms exist for learning the structure of such data. However, it is not always the case that interactions between entities can be described in terms of pairs; often, higher order relations of size three and higher offer a more natural formulation. Consider the k-lines clustering problem, where the objective is to group data points in a d- dimensional vector space into k clusters where elements in each cluster are well approximated by a line. As every pair of data points trivially define a line, there does not exist a useful measure of similarity between pairs of points for this problem. However, it is possible to define measures of similarity over triplets of points that indicate how close they are to being collinear. Another example is the task of constructing an embedding obtained from psychometric experiments involving paired comparisons. A common paired comparison experiment is where subjects are shown three images -- X, Y and Z -- and asked to report which of X or Y is more similar to Z. Here each observation is a triadic relation on the set of images. We refer to the problem of learning from such data -sets as higher order learning. In this dissertation we present solutions to two problems in higher order learning. The first is a new technique for clustering called Clique Averaging, in which we use a novel generative model for hypergraphs to construct a clustering algorithm that has better theoretical as well as empirical performance than existing algorithms. We show applications to illumination invariant clustering and k-subspace learning. The second is a new non-metric multi-dimensional scaling (MDS) algorithm for ordinal data, that is, data sets in which instead of the magnitude, only the relative or rank order of distances is known. We will show how this algorithm can be used to construct a perceptual space for reflectance

Modern data analytics applications typically process massive amounts of data on clusters of tens, hundreds, or thousands of machines to support near-real-time decisions. The quantity of data and limitations of disk and memory bandwidth often make it infeasible to deliver answers at human-interactive speeds. However, it has been widely observed that many applications can tolerate some degree of inaccuracy. This is especially true for exploratory queries on data, where users are satisfied with "close-enough" answers if they can be provided quickly to the end user. A popular technique for speeding up queries at the cost of accuracy is to execute each query on a sample of data, rather than the whole dataset. In this thesis, we present BlinkDB, a massively parallel, approximate query engine for running interactive SQL queries on large volumes of data. BlinkDB allows users to trade-off query accuracy for response time, enabling interactive queries over massive data by running queries on data samples and presenting results annotated with meaningful error bars. To achieve this, BlinkDB uses three key ideas: (1) an adaptive optimization framework that builds and maintains a set of multi-dimensional stratified samples from original data over time, (2) a dynamic sample selection strategy that selects an appropriately sized sample based on a query's accuracy or response time requirements, and (3) an error estimation and diagnostics module that produces approximate answers and reliable error bars. We evaluate BlinkDB extensively against well-known database benchmarks and a number of real-world analytic workloads showing that it is possible to implement an end-to-end query approximation pipeline that produces approximate answers with reliable error bars at interactive speeds.

We present practical algorithms for stratified autocalibration with theoretical guarantees of global optimality. Given a projective reconstruction, we first upgrade it to affine by estimating the position of the plane at infinity. The plane at infinity is computed by globally minimizing a least squares formulation of the modulus constraints. In the second stage, this affine reconstruction is upgraded to a metric one by globally minimizing the infinite homography relation to compute the dual image of the absolute conic (DIAC). The positive semidefiniteness of the DIAC is explicitly enforced as part of the optimization process, rather than as a post-processing step.
For each stage, we construct and minimize tight convex relaxations of the highly non-convex objective functions in a branch and bound optimization framework. We exploit the inherent problem structure to restrict the search space for the DIAC and the plane at infinity to a small, fixed number of branching dimensions, independent of the number of views. Chirality constraints are incorporated into our convex relaxations to automatically select an initial region which is guaranteed to contain the global minimum.
Experimental evidence of the accuracy, speed and scalability of our algorithm is presented on synthetic and real data.