Replicable Learning Algorithms
Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Replicable Learning Algorithms

Abstract

Reproducibility and replicability are central to the process of learning. Machine learning algorithms can solve a variety of tasks, in turn allowing researchers to ask new questions. However, reproducibility issues can limit the scope and utility of these algorithms. In this dissertation, we investigate replicability from a theoretical perspective. We formalize a new mathematical definition of replicability. Our definition applies to randomized algorithms that learn from independent and identically distributed (i.i.d.) samples. We propose replicable algorithms for fundamental learning tasks such as computing statistical queries, boosting weak learners, and learning halfspaces. We discuss techniques for designing replicable algorithms, resolving tensions among accuracy, replicability, and efficiency. Furthermore, we construct black-box algorithmic reductions between replicability and other notions of algorithmic stability, such as differential privacy. We also note possible directions for future replicability research.

• The introduction is a general-audience explanation of the contents of this dissertation.• In Chapter 1, we formally introduce replicability, the central concept of this dissertation. We mathematically define replicability and the associated model, noting key features and limitations. This chapter summarizes the theorems and algorithms in the remaining chapters. We mention promising future directions and argue why researchers should pursue them. • In Chapter 2, we list central definitions and briefly note how they are used. The remaining three chapters are reprints of three published conference papers related to replicable learning algorithms, with readability edits. • In Chapter 3, we mathematically define replicability and motivate our definition. We design new replicable algorithms for fundamental learning tasks such as computing statistical queries and learning halfspaces. This chapter discusses basic properties of replicability, including alternate definitions. • In Chapter 4, we exhibit a paper that predates and greatly contributed to this dissertation’s work on replicability. • In Chapter 5, we prove strong connections between replicability and other well-studied stability notions such as differential privacy. We show black-box ways to transform certain stable algorithms to replicable algorithms and vice versa. These transformations may increase the required sample sizes and run times. We prove these increases are necessary. Applications of our connections include new replicable learning algorithms and resolving open questions in algorithmic stability.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View