Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Less is More: Mitigating batch-effects in large scale RNA-Seq experiments by balancing experimental factors using a genetic algorithm

Abstract

Randomization has been considered as the most important method to protect against bias and ensure the internal validity of clinical trial studies. Conducting randomization procedures could induce comparability with respect to known and unknown covariates, mitigate selection bias, and provide a basis for inference. However, randomization can’t guarantee each covariate is balanced in large scale clinical samples. While the advent of next-generation sequencing (e.g., RNA-Seq) technologies allows us to measure global gene expression in a large number of samples with low cost, combining samples with imbalanced covariates in one RNA-Seq experiment can lead to the ‘batch effect’ problem. Specifically, the biological variation is confounded with unwanted variations from biased covariates. These unwanted variations must be effectively removed to eliminate batch effects that could significantly bias the biological conclusions. Unfortunately, they become indissociable and un-removable when examining samples with unbalanced experimental factors in the design process of a RNA-Seq experiment. Therefore, how to design a RNA-Seq experiment with fully balanced experimental factors to guarantee removable batch effects is an important task in the high-throughput RNA-Seq study era. In this study, we propose a genetic algorithm (GA)-based tool called BalanceIT to balance experimental factors prior to sequencing. BalanceIT identifies an optimal set of samples with balanced experimental factors to be used in the design of an RNA-Seq experiment. Using a panel of ~1000 simulated samples we demonstrate that our proposed GA-based tool is superior to the conventional randomization-based method in designing RNA-Seq experiments with samples of unbalanced experimental factors.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View