- Wagner, Justin;
- Olson, Nathan;
- Harris, Lindsay;
- Khan, Ziad;
- Farek, Jesse;
- Mahmoud, Medhat;
- Stankovic, Ana;
- Kovacevic, Vladimir;
- Yoo, Byunggil;
- Miller, Neil;
- Rosenfeld, Jeffrey;
- Ni, Bohan;
- Zarate, Samantha;
- Kirsche, Melanie;
- Aganezov, Sergey;
- Schatz, Michael;
- Narzisi, Giuseppe;
- Byrska-Bishop, Marta;
- Clarke, Wayne;
- Evani, Uday;
- Markello, Charles;
- Shafin, Kishwar;
- Zhou, Xin;
- Sidow, Arend;
- Bansal, Vikas;
- Ebert, Peter;
- Marschall, Tobias;
- Lansdorp, Peter;
- Hanlon, Vincent;
- Mattsson, Carl-Adam;
- Barrio, Alvaro;
- Fiddes, Ian;
- Xiao, Chunlin;
- Fungtammasan, Arkarachai;
- Chin, Chen-Shan;
- Wenger, Aaron;
- Rowell, William;
- Sedlazeck, Fritz;
- Carroll, Andrew;
- Salit, Marc;
- Zook, Justin
Genome in a Bottle benchmarks are widely used to help validate clinical sequencing pipelines and develop variant calling and sequencing methods. Here we use accurate linked and long reads to expand benchmarks in 7 samples to include difficult-to-map regions and segmental duplications that are challenging for short reads. These benchmarks add more than 300,000 SNVs and 50,000 insertions or deletions (indels) and include 16% more exonic variants, many in challenging, clinically relevant genes not covered previously, such as PMS2. For HG002, we include 92% of the autosomal GRCh38 assembly while excluding regions problematic for benchmarking small variants, such as copy number variants, that should not have been in the previous version, which included 85% of GRCh38. It identifies eight times more false negatives in a short read variant call set relative to our previous benchmark. We demonstrate that this benchmark reliably identifies false positives and false negatives across technologies, enabling ongoing methods development.