## Type of Work

Article (100) Book (0) Theses (4) Multimedia (0)

## Peer Review

Peer-reviewed only (104)

## Supplemental Material

Video (0) Audio (0) Images (0) Zip (0) Other files (0)

## Publication Year

## Campus

UC Berkeley (8) UC Davis (11) UC Irvine (21) UCLA (15) UC Merced (0) UC Riverside (6) UC San Diego (33) UCSF (29) UC Santa Barbara (0) UC Santa Cruz (3) UC Office of the President (20) Lawrence Berkeley National Laboratory (12) UC Agriculture & Natural Resources (0)

## Department

School of Medicine (26) Research Grants Program Office (RGPO) (20) UCSF Library (4) Department of Epidemiology and Biostatistics (3) Department of Psychiatry, UCSD (3) School of Public Health (2)

## Journal

## Discipline

Medicine and Health Sciences (3) Engineering (1) Life Sciences (1)

## Reuse License

BY - Attribution required (11)

## Scholarly Works (104 results)

### Background

Tumor genomes are often highly heterogeneous, consisting of genomes from multiple subclonal types. Complete characterization of all subclonal types is a fundamental need in tumor genome analysis. With the advancement of next-generation sequencing, computational methods have recently been developed to infer tumor subclonal populations directly from cancer genome sequencing data. Most of these methods are based on sequence information from somatic point mutations, However, the accuracy of these algorithms depends crucially on the quality of the somatic mutations returned by variant calling algorithms, and usually requires a deep coverage to achieve a reasonable level of accuracy.### Results

We describe a novel probabilistic mixture model, MixClone, for inferring the cellular prevalences of subclonal populations directly from whole genome sequencing of paired normal-tumor samples. MixClone integrates sequence information of somatic copy number alterations and allele frequencies within a unified probabilistic framework. We demonstrate the utility of the method using both simulated and real cancer sequencing datasets, and show that it significantly outperforms existing methods for inferring tumor subclonal populations. The MixClone package is written in Python and is publicly available at https://github.com/uci-cbcl/MixClone.### Conclusions

The probabilistic mixture model proposed here provides a new framework for subclonal analysis based on cancer genome sequencing data. By applying the method to both simulated and real cancer sequencing data, we show that integrating sequence information from both somatic copy number alterations and allele frequencies can significantly improve the accuracy of inferring tumor subclonal populations.Machine learning methods have been successfully applied to computational biology and bioinformatics for decades with both unsupervised learning and supervised learning. Recent advancement in high throughput genomic data profiling, such as high throughput sequencing and large-scale gene expression profiling, has became a powerful tool for both fundamental biological research and medicine. For example, high throughput sequencing now is possible to sequence billions of bases both fast and cheap, such as Illumina's latest sequencer HiSeq X that can sequence 32 human genomes per week with each costing less than \$1000. With the generation of millions or even billions of signals (e.g. sequencing reads) per experiment and thousands or even millions of experiments per study (e.g. large-scale gene expression profiling), there arises a great need for more advanced machine learning models for analysing high throughput genomic data using both unsupervised and supervised learning methods. In this thesis, we try to solve two main challenges in high throughput genomic data analysis, 1) deconvolving the sequencing data from more than one cell population, e.g. heterogeneous tumor tissues, using unsupervised probabilistic learning methods such as mixture models with latent variables; 2) modelling the nonlinear and hierarchical patterns within high throughput genomic data using supervised deep learning methods such as convolutional neural networks. We present five new models to solve these two challenges, each of them is applied to a specific problem. The first three models focus on deconvolving tumor heterogeneity: Chapter 2 presents a probabilistic model to deconvolve tumor purity and ploidy; Chapter 3 further extends the model to infer tumor subclonal populations; Chapter 4 presents a probabilistic model to deconvolve tumor transcriptome expression. The last two models focus on applying deep learning methods in analysing large scale genomic data: Chapter 5 presents a deep learning method for gene expression inference; Chapter 6 presents a deep learning method to understand sequence conservation.

This dissertation summarizes my research in the Lifshitz higher spin Chern-Simons theory and its relation to the integrable system KdV hierarchy as a Ph.D. candidate at UCLA. In Chapter 1, I briefly review the higher spin gravity theory and introduce the Chern-Simons theory as a realization of the Vasiliev theory in three dimensional spacetime. In Chapter 2, I review the KdV hierarchies. In Chapter 3, I discuss how to construct a solution to the Chern-Simons theory which yields a spacetime that exhibits Lifshitz scaling, I also calculate the boundary charge algebra and show the asymptotic Lifshitz symmetry is realized in terms of it. In Chapter 4, I reveal the relation between the Lifshitz Chern-Simons theory and the KdV hierarchies (in the non-supersymmetric case), a proof of the general correspondence is also given using the Drinfeld-Sokolov formalism. In Chapter 5, I work out the supersymmetric extension of this correspondence in a particular case, with the boundary charge algebra of the supersymmetric Chern-Simons theory and the second Hamiltonian structure of the super KdV identified. In Chapter 6, I discuss on the results of my study and possible directions of future research.

Zeolites containing Brønsted or Lewis acid sites are extensively used catalysts in industrial processes. In this study, force field parameters are derived for the quantum mechanics/molecular mechanics (QM/MM) simulations of reactions in zeolites by reducing the deviation between QM/MM calculations and experimental data over a range of adsorption energies. The accuracy of the thermal correction for adsorption enthalpies determined by the rigid rotor-harmonic oscillator approximation (RRHO) is examined and shown to be improved by treating low-lying vibrational modes as free translational and rotational modes via a quasi-RRHO model. With the quasi-RRHO scheme, the QM/MM simulations can accurately reproduce experimental adsorption energies for both nonpolar and polar molecules adsorbed in MFI, H-MFI, and H-BEA.

The anharmonic effects of intramolecular nuclear motions, namely torsions and vibrations, are also examined in this study. Comparing with the harmonic oscillator approximation, the accuracy of the calculation of molecular partition functions, heat capacities, entropies, and enthalpies can be improved with an uncoupled mode (UM) model, where the full-dimensional potential energy surface for internal motions is modeled as a sum of independent one-dimensional potentials for each mode. However, the extent of improvements is very limited if the one-dimensional potentials are determined by the energy as a function of displacement along each normal mode. Significant improvement can be achieved by constructing the potentials for internal rotations and vibrations separately using the energy surfaces along the torsional coordinates and the remaining vibrational normal modes.

Three reactions catalyzed by BEA zeolite are investigated in this study using the QM/MM model, including the formation of p-xylene from ethylene and 2,5-dimethylfuran (DMF) in H-BEA as well as the isomerization of glucose to fructose and the synthesis of 4-(hydroxymethyl)benzoic acid (HMBA) from ethylene and 5-(hydroxymethyl)furoic acid (HMFA) in Sn-BEA. The pathways and energy barriers of these reactions derived by QM/MM simulations agree well with the available experimental results, which validates the realism of the QM/MM model. The influence of solvents and the effect of active site structures and heteroatoms on reaction barriers are investigated to derive criteria for future catalyst design.