Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Barbara

UC Santa Barbara Electronic Theses and Dissertations bannerUC Santa Barbara

Studying chemical and biological systems using high-throughput sequencing: analytical challenges and solutions

Abstract

High-throughput sequencing (HTS) can identify unique DNA sequences and quantify their abundances from mixed DNA pools. HTS-based assays can profile complex biological or chemical systems with entities that can convert to unique DNA sequences. Computational models are also developed to analyze these HTS data at a larger scale. However, such data contain unique analytical challenges, including discrete counts, relative measurement, and small sample size. Careful assessments of these computational tools are required for robust interpretations of results.

In this dissertation, we investigated the computational challenges, proposed and assess the solutions for two applications of HTS-based assays. In the first work, we proposed k-Seq, a kinetic assay to measure the activity of self-aminoacylation ribozymes (catalytic RNA). Characterizing the kinetics for different molecules in a heterogeneous pool is challenging as their abundance and activities can vary in several orders of magnitude. We explored different designs of experiments and identified critical factors affecting the estimation of kinetic coefficients in the pseudo-first-order kinetic model for these ribozymes. Using bootstrapping, we robustly quantified the uncertainty of estimation for individual sequences and determined the minimum sequencing counts required for reliable estimations. Combining the improved experimental design and new analytical tools, we robustly quantified the kinetics for 10^5 different ribozymes.

In the second work, we constructed the correlation networks between microorganisms from metagenomic data and studied the structure of a human skin microbiome in patients with chronic wounds. We designed a variation of Gaussian graphical models to capture the direct correlations between the abundances of bacteria and viruses while accounting for the structure and limitations in the data. To minimize the discovery of false correlations from the small noisy dataset, we applied a two-step model selection to regularize the results. Lastly, we demonstrated the utility of the constructed correlation network in recovering the strong correlations between microbes, identifying potentially important microbes, and microbial clusters.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View