Search

Article
Peer Reviewed

An emerging technology to alleviate rising energy demands for data centers are economizers, which turn off the power consuming chillers and bring in outside air for cooling. However, contaminants in outdoor air can lower the reliability of the electronics through corrosion, which can negate any energy savings. This experiment seeks to determine if the indoor air quality of economizer systems is suitable for data center use. The mass concentrations of the particulate matter were measured both inside and outside of the data center, using aerosol instruments. Particles were captured using collection filters, to identify their chemical properties. It is shown that indoor particle concentrations rise when the economizer is operating, due to bringing in outside air. However, the concentrations are well below the ASHRAE standard, which confirms that economizer use does not pose a risk for the servers in a data center.

Cover page: Evaluating Economizer Use In Data Centers

Article

THE ""HC1 EFFECT"" IN ANION-RESIN EXCHANGE

LBL Publications (1959)

Thesis
Peer Reviewed

Scalable Algorithms for Genetic Association Studies, Genotype Imputation, and Ancestry Inference

UCLA Electronic Theses and Dissertations (2021)

This dissertation develops statistical and computational methods for human genetics. We considerproblems in genome-wide association studies, imputation, phasing, and ancestry inference. The methods we develop are statistically robust, grounded in biological reality, and run extremely fast. Furthermore, we test these methods on the largest data available to us, such as the UK Biobank and Haplotype Reference Consortium. We implement our methods in individual, open-sourced Julia packages. They are freely available to the scientific community through the OpenMendel platform.

Cover page: Scalable Algorithms for Genetic Association Studies, Genotype Imputation, and Ancestry Inference

Article
Peer Reviewed

A fast data-driven method for genotype imputation, phasing and local ancestry inference: MendelImpute.jl

UCLA Previously Published Works (2021)

Motivation

Current methods for genotype imputation and phasing exploit the volume of data in haplotype reference panels and rely on hidden Markov models (HMMs). Existing programs all have essentially the same imputation accuracy, are computationally intensive and generally require prephasing the typed markers.

Results

We introduce a novel data-mining method for genotype imputation and phasing that substitutes highly efficient linear algebra routines for HMM calculations. This strategy, embodied in our Julia program MendelImpute.jl, avoids explicit assumptions about recombination and population structure while delivering similar prediction accuracy, better memory usage and an order of magnitude or better run-times compared to the fastest competing method. MendelImpute operates on both dosage data and unphased genotype data and simultaneously imputes missing genotypes and phase at both the typed and untyped SNPs (single nucleotide polymorphisms). Finally, MendelImpute naturally extends to global and local ancestry estimation and lends itself to new strategies for data compression and hence faster data transport and sharing.

Availability and implementation

Software, documentation and scripts to reproduce our results are available from https://github.com/OpenMendel/MendelImpute.jl.

Supplementary information

Supplementary data are available at Bioinformatics online.

Cover page: A fast data-driven method for genotype imputation, phasing and local ancestry inference: MendelImpute.jl

Creative Commons 'BY-NC-ND' version 4.0 license

Article
Peer Reviewed

Multivariate genome-wide association analysis by iterative hard thresholding

UCLA Previously Published Works (2023)

Motivation

In a genome-wide association study, analyzing multiple correlated traits simultaneously is potentially superior to analyzing the traits one by one. Standard methods for multivariate genome-wide association study operate marker-by-marker and are computationally intensive.

Results

We present a sparsity constrained regression algorithm for multivariate genome-wide association study based on iterative hard thresholding and implement it in a convenient Julia package MendelIHT.jl. In simulation studies with up to 100 quantitative traits, iterative hard thresholding exhibits similar true positive rates, smaller false positive rates, and faster execution times than GEMMA's linear mixed models and mv-PLINK's canonical correlation analysis. On UK Biobank data with 470 228 variants, MendelIHT completed a three-trait joint analysis (n=185 656) in 20 h and an 18-trait joint analysis (n=104 264) in 53 h with an 80 GB memory footprint. In short, MendelIHT enables geneticists to fit a single regression model that simultaneously considers the effect of all SNPs and dozens of traits.

Availability and implementation

Software, documentation, and scripts to reproduce our results are available from https://github.com/OpenMendel/MendelIHT.jl.

Cover page: Multivariate genome-wide association analysis by iterative hard thresholding

Article

Iterative Hard Thresholding in GWAS: Generalized Linear Models, Prior Weights, and Double Sparsity

UCLA Previously Published Works (2019)

1

Background

Consecutive testing of single nucleotide polymorphisms (SNPs) is usually employed to identify genetic variants associated with complex traits. Ideally one should model all covariates in unison, but most existing analysis methods for genome-wide association studies (GWAS) perform only univariate regression.

Results

We extend and efficiently implement iterative hard thresholding (IHT) for multiple regression, treating all SNPs simultaneously. Our extensions accommodate generalized linear models (GLMs), prior information on genetic variants, and grouping of variants. In our simulations, IHT recovers up to 30% more true predictors than SNP-by-SNP association testing, and exhibits a 2 to 3 orders of magnitude decrease in false positive rates compared to lasso regression. We also test IHT on the UK Biobank hypertension phenotypes and the Northern Finland Birth Cohort of 1966 cardiovascular phenotypes. We find that IHT scales to the large datasets of contemporary human genetics and recovers the plausible genetic variants identified by previous studies.

Conclusions

Our real data analysis and simulation studies suggest that IHT can (a) recover highly correlated predictors, (b) avoid over-fitting, (c) deliver better true positive and false positive rates than either marginal testing or lasso regression, (d) recover unbiased regression coefficients, (e) exploit prior information and group-sparsity and (f) be used with biobank sized data sets. Although these advances are studied for GWAS inference, our extensions are pertinent to other regression problems with large numbers of predictors.

Article
Peer Reviewed

Iterative hard thresholding in genome-wide association studies: Generalized linear models, prior weights, and double sparsity

UCLA Previously Published Works (2020)

Background

Consecutive testing of single nucleotide polymorphisms (SNPs) is usually employed to identify genetic variants associated with complex traits. Ideally one should model all covariates in unison, but most existing analysis methods for genome-wide association studies (GWAS) perform only univariate regression.

Results

We extend and efficiently implement iterative hard thresholding (IHT) for multiple regression, treating all SNPs simultaneously. Our extensions accommodate generalized linear models, prior information on genetic variants, and grouping of variants. In our simulations, IHT recovers up to 30% more true predictors than SNP-by-SNP association testing and exhibits a 2-3 orders of magnitude decrease in false-positive rates compared with lasso regression. We also test IHT on the UK Biobank hypertension phenotypes and the Northern Finland Birth Cohort of 1966 cardiovascular phenotypes. We find that IHT scales to the large datasets of contemporary human genetics and recovers the plausible genetic variants identified by previous studies.

Conclusions

Our real data analysis and simulation studies suggest that IHT can (i) recover highly correlated predictors, (ii) avoid over-fitting, (iii) deliver better true-positive and false-positive rates than either marginal testing or lasso regression, (iv) recover unbiased regression coefficients, (v) exploit prior information and group-sparsity, and (vi) be used with biobank-sized datasets. Although these advances are studied for genome-wide association studies inference, our extensions are pertinent to other regression problems with large numbers of predictors.

Cover page: Iterative hard thresholding in genome-wide association studies: Generalized linear models, prior weights, and double sparsity

Article
Peer Reviewed

Unsupervised discovery of ancestry-informative markers and genetic admixture proportions in biobank-scale datasets

UCLA Previously Published Works (2023)

Admixture estimation plays a crucial role in ancestry inference and genome-wide association studies (GWASs). Computer programs such as ADMIXTURE and STRUCTURE are commonly employed to estimate the admixture proportions of sample individuals. However, these programs can be overwhelmed by the computational burdens imposed by the 10⁵ to 10⁶ samples and millions of markers commonly found in modern biobanks. An attractive strategy is to run these programs on a set of ancestry-informative SNP markers (AIMs) that exhibit substantially different frequencies across populations. Unfortunately, existing methods for identifying AIMs require knowing ancestry labels for a subset of the sample. This supervised learning approach creates a chicken and the egg scenario. In this paper, we present an unsupervised, scalable framework that seamlessly carries out AIM selection and likelihood-based estimation of admixture proportions. Our simulated and real data examples show that this approach is scalable to modern biobank datasets. OpenADMIXTURE, our Julia implementation of the method, is open source and available for free.

Cover page: Unsupervised discovery of ancestry-informative markers and genetic admixture proportions in biobank-scale datasets

Article
Peer Reviewed

OpenMendel: a cooperative programming project for statistical genetics

UCLA Previously Published Works (2020)

Statistical methods for genome-wide association studies (GWAS) continue to improve. However, the increasing volume and variety of genetic and genomic data make computational speed and ease of data manipulation mandatory in future software. In our view, a collaborative effort of statistical geneticists is required to develop open source software targeted to genetic epidemiology. Our attempt to meet this need is called the OPENMENDEL project (https://openmendel.github.io). It aims to (1) enable interactive and reproducible analyses with informative intermediate results, (2) scale to big data analytics, (3) embrace parallel and distributed computing, (4) adapt to rapid hardware evolution, (5) allow cloud computing, (6) allow integration of varied genetic data types, and (7) foster easy communication between clinicians, geneticists, statisticians, and computer scientists. This article reviews and makes recommendations to the genetic epidemiology community in the context of the OPENMENDEL project.

Cover page: OpenMendel: a cooperative programming project for statistical genetics