SNP Calling Using Genotype Model Selection on High-Throughput Sequencing Data
Recent advances in high-throughput sequencing (HTS) promise revolutionary impacts in science and technology, including the areas of disease diagnosis, pharmacogenomics, and mitigating antibiotic resistance. An important way to analyze the increasingly abundant HTS data, is through the use of single nucleotide polymorphism (SNP) callers. Considering a selection of popular HTS SNP calling procedures, it becomes clear that many rely mainly on base-calling and read mapping quality values. Thus there is a need to consider other sources of error when calling SNPs, such as those occurring during genomic sample preparation. Genotype Model Selection (GeMS), a novel method of consensus and SNP calling which accounts for genomic sample preparation errors, is thus given. Simulation studies demonstrate that GeMS has the best balance of sensitivity and positive predictive value (PPV) among a selection of popular SNP callers. Real data analyses also support this conclusion.
As an extension to the aforementioned single sample GeMS, the multiple sample Genotype Model Selection (multiGeMS) method is also given. A simulation study and a real data analysis demonstrate that multiGeMS has a good balance of sensitivity and PPV when compared to a selection of popular multiple sample SNP callers.