- Qin, Yujia;
- Wu, Liyou;
- Zhang, Qiuting;
- Wen, Chongqin;
- Van Nostrand, Joy D;
- Ning, Daliang;
- Raskin, Lutgarde;
- Pinto, Ameet;
- Zhou, Jizhong
- Editor(s): Gilbert, Jack A
Importance
Amplicon sequencing of targeted genes is the predominant approach to estimate the membership and structure of microbial communities. However, accurate reconstruction of community composition is difficult due to sequencing errors, and other methodological biases and effective approaches to overcome these challenges are essential. Using a mock community of 33 phylogenetically diverse strains, this study evaluated the effect of GC content on sequencing results and tested different approaches to improve overall sequencing accuracy while characterizing the pros and cons of popular amplicon sequence data processing approaches. The sequencing results from this study can serve as a benchmarking data set for future algorithmic improvements. Furthermore, the new insights on sequencing error, chimera formation, and GC bias from this study will help enhance the quality of amplicon sequencing studies and support the development of new data analysis approaches.