Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Previously Published Works bannerUC San Diego

Robust and accurate estimation of paralog-specific copy number for duplicated genes using whole-genome sequencing

Abstract

The human genome contains hundreds of low-copy repeats (LCRs) that are challenging to analyze using short-read sequencing technologies due to extensive copy number variation and ambiguity in read mapping. Copy number and sequence variants in more than 150 duplicated genes that overlap LCRs have been implicated in monogenic and complex human diseases. We describe a computational tool, Parascopy, for estimating the aggregate and paralog-specific copy number of duplicated genes using whole-genome sequencing (WGS). Parascopy is an efficient method that jointly analyzes reads mapped to different repeat copies without the need for global realignment. It leverages multiple samples to mitigate sequencing bias and to identify reliable paralogous sequence variants (PSVs) that differentiate repeat copies. Analysis of WGS data for 2504 individuals from diverse populations showed that Parascopy is robust to sequencing bias, has higher accuracy compared to existing methods and enables prioritization of pathogenic copy number changes in duplicated genes.

Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View