This is a short note on the mating scheme used by the Mutagenesis Project. We present a probabilistic analysis of the distribution of the number of mutants in the G3 generation. It is shown to be a function of the number of G2 mothers and litter sizes. A computer program is provided to make the calculation. We quantify the odds of a G2 mother being a mutation carrier given that none of its progeny are mutants. Finally we analyze some data from the project; we find the data to be consistent with theory.

## Type of Work

Article (30) Book (0) Theses (0) Multimedia (0)

## Peer Review

Peer-reviewed only (25)

## Supplemental Material

Video (0) Audio (0) Images (0) Zip (0) Other files (2)

## Publication Year

## Campus

UC Berkeley (3) UC Davis (2) UC Irvine (0) UCLA (0) UC Merced (0) UC Riverside (0) UC San Diego (0) UCSF (29) UC Santa Barbara (0) UC Santa Cruz (0) UC Office of the President (7) Lawrence Berkeley National Laboratory (0) UC Agriculture & Natural Resources (0)

## Department

Research Grants Program Office (RGPO) (7) Center for Bioinformatics and Molecular Biostatistics (6) Department of Epidemiology and Biostatistics (4) School of Public Health (3)

## Journal

## Discipline

## Reuse License

BY-NC-SA - Attribution; NonCommercial use; Derivatives use same license (2)

## Scholarly Works (30 results)

Many computations in biomedical research such as simulations, bootstrapping, database searches (such as BLAST), and many Monte Carlo algorithms are embarrassingly parallel. This means that the computation can be split up into smaller computations; each of those calculations can be performed in parallel threads that do not need to interact with each other. Computations with this feature can be easily distributed,(that is, run on different computer processors), with a gain in speed that is approximately proportional to the number of processors. In this note we introduce some of the concepts behind distributed computing, examples where they have been used, and lay out scenarios where they may be useful for biomedical researchers in the future.

The Mutagroup at Jackson Labs is interested in generating new mouse models for studying neurological disease by producing mutations in mice by injecting them with ENU. The group proposes to produce large numbers of potential mutants and screen them for phenotypic anomalies. In this report we propose a statistical algorithm to flag phenotypic deviants. We have applied the algorithm to a pilot data set collected by Dr. Kevin Seburn on mice placed in cages equipped with monitoring devices. Aiming for a 5% false positive rate, the algorithm was able to detect 18 of the 27 mutant mice it was presented.

Selective genotyping and phenotyping strategies can reduce the cost of QTL (quantitative trait loci) experiments. We analyze selective genotyping and phenotyping strategies in the context of multi-locus models, and non-normal phenotypes. Our approach is based on calculations of the expected information of the experiment under different strategies. Our central conclusions are the following. (1) Selective genotyping is effective for detecting linked and epistatic QTL as long as no locus has a large effect. When one or more loci have large effects, the effectiveness of selective genotyping is unpredictable – it may be heightened or diminished relative to the small effects case. (2) Selective phenotyping efficiency decreases as the number of unlinked loci used for selection increases, and approaches random selection in the limit. However, when phenotyping is expensive, and a small fraction can be phenotyped, the efficiency of selective phenotyping is high compared to random sampling, even when over 10 loci are used for selection. (3) For time-to-event phenotypes such as lifetimes, which have a long right tail, right-tail selective genotyping is more effective than two-tail selective genotyping. For heavy-tailed phenotype distributions, such as the Cauchy distribution, the most extreme phenotypic individuals are not the most informative. (4) When the phenotype distribution is exponential, and a right-tail selective genotyping strategy is used, the optimal selection fraction (proportion genotyped) is less than 20%or 100% depending on genotyping cost. (5) For time-to-event phenotypes where followup cost increases with the lifetime of the individual, we derive the optimal followup time that maximizes the information content of the experiment relative to its cost. For example, when the cost of following up an individual for the average lifetime in the population is approximately equal to the fixed costs of genotyping and breeding, the optimal strategy is to follow up approximately 70% of the population.

We examine the efficiency of different genotyping and phenotyping strategies in inbred line crosses from an information perspective. This provides a mathematical framework for the statistical aspects of QTL experimental design, while guiding our intuition. Our central result is a simple formula that quantifies the fraction of missing information of any genotyping strategy in a backcross. It includes the special case of selectively genotyping only the phenotypic extreme individuals. The formula is a function of the square of the phenotype, and the uncertainty in our knowledge of the genotypes at a locus. This result is used to answer a variety of questions. First, we examine the cost-information tradeoff varying the density of markers, and the proportion of extreme phenotypic individuals genotyped. Then we evaluate the information content of selective phenotyping designs and the impact of measurement error in phenotyping. A simple formula quantifies the information content of any combined phenotyping and genotyping design. We extend our results to cover multi-genotype crosses such as the F_{2} intercross, and multiple QTL models. We find that when the QTL effect is small, any contrast in a multi-genotype cross benefits from selective genotyping in the same manner as in a backcross. The benefit remains in the presence of a second unlinked QTL with small effect (explaining less than 20% of the variance), but diminishes if the second QTL has a large effect. Software for performing power calculations for backcross and F_{2} intercross incorporating selective genotyping and marker spacing is available [in related files].

- 12 supplemental files

An investigator planning a QTL (quantitative trait locus) experiment has to choose which strains to cross, the type of cross, genotyping strategies, and the number of progeny to raise and phenotype. To help make such choices, we have developed an interactive program for power and sample size calculations for QTL experiments, R/qtlDesign. Our software includes support for selective genotyping strategies, variable marker spacing, and tools to optimize information content subject to cost constraints, for backcross, intercross, and recombinant inbred lines from two parental strains. We review the impact of experimental design choices on the variance attributable to a segregating locus, the residual error variance, and the effective sample size. We give examples of software usage in real-life settings. The software is available at http://www.biostat.ucsf.edu/sen/software.html.