Rare variant analysis for common diseases
Genome-wide association (GWA) studies, which search for association between single, common genetic markers and a disease phenotype, have shown varying degrees of success. Recent studies suggest that multiple rare variants at different loci may act in concert to influence etiology of common diseases. However, current GWA studies depend upon the Common Disease Common Variant hypothesis and are not powered to account for disease-causing rare variants (CDRV). We consider a simple CDRV model where a subset of rare variants at a locus is independently deleterious. These rare variants have modest penetrance (their presence increases disease likelihood significantly), but each explains only a small fraction of the diseased individuals. For this model, we describe an algorithm that efficiently computes a subset of rare-variants that best associate with the disease. We test our approach using extensive simulations. Neutral rare-variants are simulated using coalescent models, and causal variants are simulated under selection using Wright's equation. Our method is robust in detecting associations with modest odds ratios in relatively small cohort studies. It significantly outperforms previously proposed strategies, and is robust to other CDRV models. Our simulations suggest that associated regions with an odds ratio of 1.2 can be identified with a cohort study of 10000 individuals. Those with an odds ratio of 1.5 can be identified with a cohort study of 5000 individuals. Regions with stronger association (odds ratio > 2:2) can be identified with cohort studies of 1500 individuals or less. The method was applied to a cohort of 289 individuals, 143 with high BMI and 146 with relatively low BMI. Two genes (187 kb) were sequenced in this cohort using next-generation sequencing. We identied two subsets of co-located variants that showed significant association with BMI (out of a total of 1088 rare variants). These subsets associated showed association with un-adjusted chi-square P-values of 3.6 x 10⁻⁴ and 1.4 x 10⁻⁴, and permutation based P-values of 0.004 and 8.04 x 10⁻⁴, respectively. In comparison, the most significant association using single marker tests had a chi-square P-value of 0.002. Our results suggest that whole genome association using an analysis of rare variants is feasible.