- Becker, Joel;
- Burik, Casper AP;
- Goldman, Grant;
- Wang, Nancy;
- Jayashankar, Hariharan;
- Bennett, Michael;
- Belsky, Daniel W;
- Karlsson Linnér, Richard;
- Ahlskog, Rafael;
- Kleinman, Aaron;
- Hinds, David A;
- Caspi, Avshalom;
- Corcoran, David L;
- Moffitt, Terrie E;
- Poulton, Richie;
- Sugden, Karen;
- Williams, Benjamin S;
- Harris, Kathleen Mullan;
- Steptoe, Andrew;
- Ajnakina, Olesya;
- Milani, Lili;
- Esko, Tõnu;
- Iacono, William G;
- McGue, Matt;
- Magnusson, Patrik KE;
- Mallard, Travis T;
- Harden, K Paige;
- Tucker-Drob, Elliot M;
- Herd, Pamela;
- Freese, Jeremy;
- Young, Alexander;
- Beauchamp, Jonathan P;
- Koellinger, Philipp D;
- Oskarsson, Sven;
- Johannesson, Magnus;
- Visscher, Peter M;
- Meyer, Michelle N;
- Laibson, David;
- Cesarini, David;
- Benjamin, Daniel J;
- Turley, Patrick;
- Okbay, Aysu
Polygenic indexes (PGIs) are DNA-based predictors. Their value for research in many scientific disciplines is growing rapidly. As a resource for researchers, we used a consistent methodology to construct PGIs for 47 phenotypes in 11 datasets. To maximize the PGIs' prediction accuracies, we constructed them using genome-wide association studies-some not previously published-from multiple data sources, including 23andMe and UK Biobank. We present a theoretical framework to help interpret analyses involving PGIs. A key insight is that a PGI can be understood as an unbiased but noisy measure of a latent variable we call the 'additive SNP factor'. Regressions in which the true regressor is this factor but the PGI is used as its proxy therefore suffer from errors-in-variables bias. We derive an estimator that corrects for the bias, illustrate the correction, and make a Python tool for implementing it publicly available.