De, Anindya

Efficient learning algorithms with limited information

2013

Abstract

The thesis explores efficient learning algorithms in settings which are more restrictive than the PAC model of learning (Valiant) in one of the following two senses: (i) The learning algorithm has a very weak access to the unknown function, as in, it does not get labeled samples for the unknown function (ii) The error guarantee required from the hypothesis is more stringent than the PAC model.

Ever since its introduction, the PAC model of learning is considered as the standard model of learning. However, there are many situations encountered in practice in which the PAC model does not adequately model the learning problem in hand. To meet this limitation, several other models have been introduced for e.g. the agnostic learning model, the restricted focus of attention model, the positive examples only model amongst others. The last two models are the ones which are most relevant to the work in this thesis.

Beyond the motivation of modeling learning problems, another reason to consider these alternate models is because of the rich interplay between the questions arising here and other areas like computational complexity theory and social choice theory. Further, the study of these models lead to interesting questions in discrete Fourier analysis and probability theory. In fact, the connections forged between these learning questions and Fourier analysis and probability theory are key to the results in this thesis.

Part~1 of this thesis is motivated by an easy theorem of C.K. Chow who showed that over the uniform distribution on the hypercube, a halfspace is uniquely identified (in the space of all Boolean functions) by its expectation and correlation with the input bits. Rephrased in the language of Fourier analysis, if f is a halfspace and g is any other Boolean function, such that the degree 0 and 1 Fourier coefficients of f and g are the same, then f and g are identical over the hypercube. The degree 0 and 1 Fourier coefficients of f are sometimes called the Chow parameters of f. Chow's result immediately gives rise to two questions: The first is whether this characterization is robust i.e.~ if the Chow parameters of f and g are close to each other and f is halfspace, then are f and g close to each other in Hamming distance? The second question is whether Chow's characterization is algorithmic. In other words, is there an efficient algorithm which given the Chow parameters of a halfspace f, can reconstruct the halfspace f (exactly or approximately)? Chapter 2 answers both these questions in the affirmative and in fact shows a strong connection between the answers to the two questions.

In Chapter 3, we consider a different generalization of the theorem of Chow. It is easy to extend the theorem of Chow that show over any distribution D on the hypercube (which has full support), a halfspace f is completely characterized within the space of all Boolean functions by its expectation and correlation with the input bits. Similar to Chapter 2, one can ask if this characterization can be made algorithmic and robust. While the question is interesting in its own right, for specific choices of distribution D, this problem has been investigated intensively in the social choice literature where it is known as Inverse power index problem. Informally speaking, the function f is viewed as a voting function and the correlation between f and an input bit is viewed as the influence of that input bit (The distribution D is dependent on the definition of influence that is used). Arguably the most popular choice for power indices in social choice literature is the Shapley-Shubik indices. Despite much interest in the Inverse power index problem for Shapley-Shubik indices, prior to our work, there was no rigorous algorithm for this problem. In Chapter 3, we give the first polynomial time approximation scheme for the Inverse power index problem for Shapley-Shubik indices.

Part 2 of this thesis is motivated by the problem of learning distributions over the hypercube. While the problem of learning distributions has a rich history in statistics and computer science, there has not been much work in learning distributions over the hypercube which are interesting from the point of view of Boolean functions. While most natural learning problems that can be framed in this context can easily shown to be hard (under cryptographic assumption), a sweet spot comes from the following kind of learning problems: Let C be an interesting class of Boolean functions (say halfspaces). Given an unknown f in C, the distribution learning problem we consider is to learn the uniform distribution over f^{-1}(1) provided we have access to random samples from this set. This question has interesting connections to questions in complexity theory like sampling NP solutions among others. In Chapter 3, we consider this problem for many different instantiations of the class C. While we give efficient algorithms for classes like halfspaces and DNFs, we show that classes like CNFs and degree-d polynomial threshold functions which are learnable in the standard PAC model, are cryptographically hard to learn in this context.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC Berkeley

Efficient learning algorithms with limited information