A Comparison of Imputative Capability on Algorithms for fitting the PARAFAC model to Biological Data
Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

A Comparison of Imputative Capability on Algorithms for fitting the PARAFAC model to Biological Data

Abstract

Researchers often find it challenging to simplify complex data sets for downstream analysis.The combination of multiple variables can lead to complexity, and organizing such data sets into a higher dimensional structure can be more intuitive. For these data sets with an inherent multi-modal structure, a variety of dimensionality reduction techniques have allowed researchers to explore and infer biological interactions more effectively. Higher-order dimensionality reduction techniques all serve to accomplish the same purpose - to reduce the original data set and recover meaningful and interpretable patterns. The CANDECOMP/ PARAFAC (CP) model, a frequent choice among researchers for its interpretability, still requires metrics for validating its performance and assuring an appropriate model complexity is selected. While a common benchmark for these methods’ validation is typically the total residual error, imputation error (prediction error) can serve as a more trusted alternative. We describe an algorithm for fitting the PARAFAC model, censored alternating least squares, that innately handles missing values and compare it amongst alternating least squares and direct optimization using simulated and real data sets with varying degrees of missing values using these performance metrics. While each method has its own benefits, censored alternating least squares appears best suited for handling missing values, commonly present in the data that researchers look to investigate.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View