Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

Data Completion and Robust Principal Component Analysis under Low-rank Restrictions

Abstract

In the real world, many kinds of high-dimensional data, such as images, documents, user-rating data, and health-related data, have internal low-dimensional structures. Mathematicians conceptualize the idea of ‘low-dimension’ as low-matrix-rank and developed various dimensionality reduction methods such as principal component analysis (PCA) and non-negative matrix factorization (NMF) under the low-matrix-rank assumption. This thesis contains four projects during my Ph.D. study. The target data sets of the first three projects are under the assumption of low-matrix-rank or low-tensor-rank. The first part focuses on a matrix completion task, where we propose a data completion method with convex regularizers to address the fragmented data issue. We then combine the data completion method with a temporally hierarchical attention network (THAN) to predict human stress levels with recovered sensor data. In the second work, we propose a simple but efficient weighted higher-order singular value decomposition (HOSVD) algorithm for recovering the tensor data from noisy observations. We also combine the weighted HOSVD and the total variation minimization method to efficiently fill in the missing data for images and videos. In the third work, we propose a fast non-convex algorithm, Robust Tensor CUR (RTCUR), for large-scale tensor robust principal component analysis (TRPCA) problems. The main advantage of RTCUR over other TRPCA methods is the computational efficiency; we demonstrate the efficiency and effectiveness of RTCUR on both synthetic and real-world datasets. In all these three works, we explore the connection between the rank in mathematical definition and the real-world data by developing algorithms with the low-rank assumption that solves real-world tasks. The last work studies a quantitative framework to infer the political bias and source quality of media outlets from text. We collect the tweets that each media outlet posted during a specific time range and use a bidirectional long short-term memory (LSTM) neural network to infer the bias and quality values for each tweet.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View