Search

Scholarly Works (2 results)

Thesis
Peer Reviewed

Provable and Efficient Algorithms for Federated, Batch and Reinforcement Learning

UC Berkeley Electronic Theses and Dissertations (2021)

We propose and analyze iterative algorithms that are computationally efficient, statistically sound and adaptive (in some settings). We consider three different frameworks in which data is presented to the learner. First, we consider the Federated (Distributed) Learning (FL) setup, where data is only available at the edge, and a center machine learns various models via iteratively interacting with the edge nodes. Second, we study the canonical setting of supervised batch learning, where all the data and label pairs are available to the learner at the beginning. Third, we examine the framework of online learning, where data is presented in a streaming fashion. In particular, we focus on specific settings like Bandit and Reinforcement Learning (RL).

In the Federated Learning (FL) framework, we address the canonical problems of device heterogeneity, communication bottleneck and adversarial robustness for large scale high dimensional problems. We propose efficient and provable first and second order algorithms, and use ideas like quantization of information and apply several robust aggregation schemes to address the above-mentioned problems, while retaining the optimal statistical rates simultaneously. For the (supervised) batch learning framework, we use an efficient and statistically sound algorithm, namely Alternating Minimization (AM) and address the problem of max-affine regression; a non convex problem that generalizes the classical phase retrieval and closely resembles convex regression. We give convergence guarantees of AM, with near optimal statistical rate. Finally, in the online learning setup, we address the problem of adaptation (model selection) for contextual bandits (linear and beyond) and later extend these techniques to Reinforcement Learning (RL). Our algorithms here are efficient, provable and more importantly adaptive to the problem complexity.

Cover page: Provable and Efficient Algorithms for Federated, Batch and Reinforcement Learning

Thesis
Peer Reviewed

Information Theory, Dimension Reduction and Density Estimation

UC Berkeley Electronic Theses and Dissertations (2018)

This thesis documents three different contributions in statistical learning theory. They were developed with careful emphasis on addressing the demands of modern statistical analysis upon large-scale modern datasets. The contributions concern themselves with advancements in information theory, dimension reduction and density estimation - three foundational topics in statistical theory with a plethora of applications in both practical problems and development of other aspects of statistical methodology.

In Chapter \ref{chapter:fdiv}, I describe the development of an unifying treatment of the study of inequalities between $f$-divergences, which are a general class of divergences between probability measures which include as special cases many commonly used divergences

in probability, mathematical statistics and information theory such as Kullback-Leibler divergence, chi-squared divergence, squared Hellinger distance, total variation distance etc. In contrast with previous research in this area, we study the problem of obtaining sharp inequalities between $f$-divergences in full generality. In particular, our main results allow $m$ to be an arbitrary positive integer and all the divergences $D_f$ and $D_{f_1}, \dots, D_{f_m}$ to be arbitrary $f$-divergences. We show that the underlying optimization problems can

be reduced to low-dimensional optimization problems and we outline methods for solving them. We also show that many of the existing

results on inequalities between $f$-divergences can be obtained as special cases of our results and we also improve on some existing

non-sharp inequalities.

In Chapter \ref{chapter:srp}, I describe the development of a new dimension reduction technique specially suited for interpretable inference in supervised learning problems involving large-dimensional data. This new technique, Supervised Random Projections (SRP), is introduced with the goal of ensuring that in comparison to ordinary dimension reduction, the compressed data is more relevant to the response variable at hand in a supervised learning problem. By incorporating variable importances, we explicate that the compressed data should still accurately explain the response variable; thus lending more interpretability to the dimension reduction step. Further, variable importances ensure that even in the presence of numerous nuisance parameters, the projected data retains at least a moderate amount of information from the important variables, thus allowing said important variables a fair chance at being selected by downstream formal tests of hypotheses.

In Chapter \ref{chapter:npmle}, I describe the development of several adaptivity properties of the Non-Parametric Maximum Likelihood Estimator (NPMLE) in the problem of estimating an unknown gaussian location mixture density based on independent identically distributed observations. Further, I explore the role of the NPMLE in the problem of denoising normal means, i.e. the problem of estimating unknown means based on observations. This problem has been studied widely. In this problem, I prove that the Generalized Maximum Likelihood Empirical Bayes estimator (GMLEB) approximates the Oracle Bayes estimator at adaptive parametric rates up to additional logarithmic factors in expected squared $\ell_2$ norm.

Cover page: Information Theory, Dimension Reduction and Density Estimation