Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Numerical Algorithm in Machine Learning and Data Analysis

No data is associated with this publication.
Abstract

In this thesis, we present novel numerical algorithms in machine learning and data analysis. This thesis consists of two chapters. Each chapter is self-contained.

In chapter 1, we present novel methods for performing k-fold cross validation for ridge regression. Ridge regression is a very broadly used method for reduce overfitting in linear regression. And it is used in many aspects of data analysis. One very important question for ridge regression is to find the best regularization hyperparameter $\lambda$. However, this can be a time consuming procedure. Here we present an efficient yet numerical stable algorithm for computing the relative error of ridge regression across different $\lambda$'s. Besides that, we provide an novel algorithm for finding the best hyper parameter $\lambda$ without computing all the relative errors.

In chapter 2, we present algorithms for efficiently finding low rank approximation of a large sparse matrix with improved Lanczos Algorithm. Low rank approximation is one of the most important techniques in data analysis. In particular, people do principal component analysis based on that. We tackle this problem by finding the truncated SVD of the matrix using our improved version of Lanczos matrix. Our method guarantees that we only perform restart of the algorithm when necessary. Also, we prove a novel scheme for doing reorthogonalization during the Lanczos iteration. Besides that, we provide new stopping criteria for Lanczos algorithm directly based on the quality of the low rank approximation.

Main Content

This item is under embargo until February 28, 2026.