Concordance of SARS-CoV-2 Antibody Results during a Period of Low Prevalence

ABSTRACT Accurate, highly specific immunoassays for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) are needed to evaluate seroprevalence. This study investigated the concordance of results across four immunoassays targeting different antigens for sera collected at the beginning of the SARS-CoV-2 pandemic in the United States. Specimens from All of Us participants contributed between January and March 2020 were tested using the Abbott Architect SARS-CoV-2 IgG (immunoglobulin G) assay (Abbott) and the EuroImmun SARS-CoV-2 enzyme-linked immunosorbent assay (ELISA) (EI). Participants with discordant results, participants with concordant positive results, and a subset of concordant negative results by Abbott and EI were also tested using the Roche Elecsys anti-SARS-CoV-2 (IgG) test (Roche) and the Ortho-Clinical Diagnostics Vitros anti-SARS-CoV-2 IgG test (Ortho). The agreement and 95% confidence intervals were estimated for paired assay combinations. SARS-CoV-2 antibody concentrations were quantified for specimens with at least two positive results across four immunoassays. Among the 24,079 participants, the percent agreement for the Abbott and EI assays was 98.8% (95% confidence interval, 98.7%, 99%). Of the 490 participants who were also tested by Ortho and Roche, the probability-weighted percentage of agreement (95% confidence interval) between Ortho and Roche was 98.4% (97.9%, 98.9%), that between EI and Ortho was 98.5% (92.9%, 99.9%), that between Abbott and Roche was 98.9% (90.3%, 100.0%), that between EI and Roche was 98.9% (98.6%, 100.0%), and that between Abbott and Ortho was 98.4% (91.2%, 100.0%). Among the 32 participants who were positive by at least 2 immunoassays, 21 had quantifiable anti-SARS-CoV-2 antibody concentrations by research assays. The results across immunoassays revealed concordance during a period of low prevalence. However, the frequency of false positivity during a period of low prevalence supports the use of two sequentially performed tests for unvaccinated individuals who are seropositive by the first test. IMPORTANCE What is the agreement of commercial SARS-CoV-2 immunoglobulin G (IgG) assays during a time of low coronavirus disease 2019 (COVID-19) prevalence and no vaccine availability? Serological tests produced concordant results in a time of low SARS-CoV-2 prevalence and no vaccine availability, driven largely by the proportion of samples that were negative by two immunoassays. The CDC recommends two sequential tests for positivity for future pandemic preparedness. In a subset analysis, quantified antinucleocapsid and antispike SARS-CoV-2 IgG antibodies do not suggest the need to specify the antigen targets of the sequential assays in the CDC’s recommendation because false positivity varied as much between assays targeting the same antigen as it did between assays targeting different antigens.

IMPORTANCE What is the agreement of commercial SARS-CoV-2 immunoglobulin G (IgG) assays during a time of low coronavirus disease 2019 (COVID- 19) prevalence and no vaccine availability? Serological tests produced concordant results in a time of low SARS-CoV-2 prevalence and no vaccine availability, driven largely by the proportion of samples that were negative by two immunoassays. The CDC recommends two sequential tests for positivity for future pandemic preparedness. In a subset analysis, quantified antinucleocapsid and antispike SARS-CoV-2 IgG antibodies do not suggest the need to specify the antigen targets of the sequential assays in the CDC's recommendation because false positivity varied as much between assays targeting the same antigen as it did between assays targeting different antigens.
KEYWORDS SARS-CoV-2, IgG antibodies, spike protein, nucleocapsid protein, low prevalence A t the beginning of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic, understanding the spread of the virus was critical for public health mitigation strategies. The diagnosis of acute SARS-CoV-2 infections (i.e., "cases") using nucleic acid amplification tests reveals only acutely infected individuals. Serology tests detect antibodies in the blood of individuals who mount an adaptive immune response to infection for weeks and potentially months or years after infection. However, serological assays to detect SARS-CoV-2 antibodies were not developed or authorized for use until April 2020 (1,2).
Most of the initial serological assays were developed to detect antibodies to epitopes of SARS-CoV-2, including antibodies against regions of the spike protein and the nucleocapsid (NC) protein. Immunoglobulin G (IgG) antibodies have been detected against SARS-CoV-2 as soon as 1 day after symptom onset, although the median time to the development of IgG was 14 days in two early studies (3,4). At this point, it is unclear how long anti-SARS-CoV-2 antibodies will persist following infection. However, most unvaccinated patients who were monitored for 6 to 8 months after the onset of symptoms had detectable but declining SARS-CoV-2-specific IgGs (5).
Previous studies that compared SARS-CoV-2 serological assays that differ in their targets showed substantial variability in the performance characteristics when using the same positive-control specimens and prepandemic negative-control specimens (6). The CDC recommends the use of a sequential testing approach if the first test yields a positive result, which increases specificity and reduces false-positive results, particularly when the prevalence of SARS-CoV-2 is low (7). In our previous study, we identified nine individuals with detectable SARS-CoV-2 antibodies by two assays that target NC (Abbott Architect SARS-CoV-2 IgG [Abbott]) and spike (EuroImmun SARS-CoV-2 enzymelinked immunosorbent assay [ELISA] [EI]) in the first 3 months of the pandemic, seven of whom had blood samples collected prior to the first confirmed cases in their states of residence within the United States (8).
Evaluation of the accuracy of the CDC's sequential testing recommendation for SARS-CoV-2 antibody positivity is important for future pandemic preparedness. Using a large sample size of specimens collected during low-prevalence months at the beginning of the pandemic (2 January to 18 March 2020), we describe concordance in all of the paired combinations of the Abbott (target, NC), Roche Elecsys anti-SARS-CoV-2 (IgG) (Roche) (target, NC), EI (target, spike), and Ortho-Clinical Diagnostics Vitros anti-SARS-CoV-2 IgG (Ortho) (target, spike) assay results.

RESULTS
Serum samples from All of Us participants tested by Abbott and EI (N = 24,079) and positive and negative controls are the study population, the subset samples from participants tested with all four immunoassays (n = 490) and the positive and negative controls are subset sample 1, and the subset samples of participants with antibody concentrations are subset sample 2 (n = 32) (Fig. 1).
(Continued from previous page) U2OD023196. D.J.S. performed part of this work while a postdoctoral researcher at Vanderbilt University Medical Center and was funded by NIH grant 5 U2C OD023196-03. K.N.A. reports NIH grants (outside this work, paid to her institution) and consultancy with TrioHealth Inc and MedIQ (paid to her). K.N.A. is a consultant for the All of Us Research Program (paid to her). K.N.A. is a recipient of royalties from Cousera Specialization that is direct and a course that she teaches (payment made to her). K.A.G. reports Department of Defense and NIH grants (outside this work, paid to her institution) and consultancy with UptoDate, Teach for America, and Aspen Institute (paid to her), and Dr Gebo did this work when she was the Chief Medical and Scientific Officer for All of Us. A.R. was funded by All of Us Data and Research Center Grant, NIH OD (paid to her institute). A.R. also received funding outside this work from NIDDK UO1 Atypical Diabetes, PI (payment to her institute). B.M. was funded by NIH and DRC, and received NIH grants outside this work paid to his institute. D.B.G. is the CEO of Actio-Biosciences and co-founder and equity holder of Praxis-Precision Medicine and declares no conflicts of interests with this project. S.G. received support for this project from NIH (5U2COD023196-04). Q.C. reports funding from NIH outside this work paid to her institution. All other authors declare no competing interests.
There were 24,079 All of Us participants tested with Abbott and EI. There were 490 participants (subset sample 1) tested with all four commercial assays, including all of the discordant (n = 277) and concordant seropositive (n = 9) samples by Abbott and EI. In addition, a random selection of 204 participants with concordant negative results had sufficient specimens to be tested by the two additional immunoassays (Roche and Ortho). Among the 490 samples tested by all four assays, 32 (subset sample 2) tested positive by at least two of the four serology assays and were further evaluated for anti-SARS-CoV-2 spike and NC IgG concentrations (Fig. 1). Demographic characteristics of the total study population (n = 24,079), subset 1 (n = 490), and subset 2 (n = 32) are listed in Table 1.
The interassay variability with Abbott and EI results was plotted among the positive and negative controls with two or more replicates (n = 70 positive controls; n = 339 negative controls) (see Fig. S1 in the supplemental material). The interassay variability of the controls was minimal for both Abbott and EI; EI demonstrated slight heteroscedasticity with the positive controls (Fig. S1). As reported in our previous study, the Abbott (8).
Concordance between the results of commercial assays and quantified antispike and anti-NC SARS-CoV-2 IgG concentrations. Twenty-one of the 32 participants who tested positive by two or more commercial assays had detectable anti-SARS-CoV-2 IgG antibody concentrations measured via a research laboratory ELISA (Table 2). Four participants tested positive by three commercial assays (Abbott, EI, and Ortho), and three of these participants had detectable antibody concentrations above the lower limit of quantification for anti-SARS-CoV-2 IgG antibodies ( Table 2). Twentyfive participants tested positive by both of the assays that target the spike protein (EI and Ortho), and 13 of these 25 participants had detectable antispike IgG concentrations (Table 2). One participant tested positive by both of the commercial assays that target the NC protein (Abbott and Roche) and had detectable anti-NC IgG ( Table 2). Ten participants tested positive by two commercial assays that targeted  (Table 2). Among the 10 participants who tested positive by two commercial assays that targeted different antigens (NC and spike), 7 (70%) had detectable anti-NC or antispike IgG antibodies.

DISCUSSION
Our study has several important findings. First, our results demonstrate the importance of large, demographically diverse studies with blood-banking capabilities. Second, our results demonstrate the importance of strategic testing with batches of positive and negative controls to ensure reproducibility. Also, our results support the CDC's recommendation for at least two positive serological test results, particularly in a time of low prevalence. Finally, 70% of those samples that were positive by the sequential Abbott and EI assays demonstrated positive quantifiable antibodies by a research laboratory assay.
Two of the nine participants positive by both the Abbott and EI assays did not have detectable antibody concentrations using the research laboratory assay. In addition, none of the participants tested positive by all commercial and research assays. Part of the reason for the discrepancies in positivity across different assays could be differences in the target antigens, isotypes identified, and assay performance characteristics and possible cross-reactivities from previous infections (9)(10)(11)(12)(13). Methods used to determine cutoff values often differ between assays and are optimized for specificity, which could lead to more false-negative results. Early during a pandemic, the value of using multiple test methods is key to confirming seropositivity in settings of low prevalence. Further testing with neutralization assays could be useful for further confirmation of seropositivity. While other studies have evaluated the prevalence of antibodies against SARS-CoV-2 at the beginning of the pandemic (UK Biobank and U.S. blood donors, etc.) (14)(15)(16)(17), no other study has been able to analyze antibody responses to exposure to SARS-CoV-2 in a population as demographically diverse as the one in this study. The maintenance of large cohorts, particularly those with active biospecimen collection and biobanking, is expensive. As such, funding agencies have moved away from collecting samples and are relying on electronic health records. The All of Us Research Program collects demographic,  clinical, and survey data combined with physical measurements and biospecimens of a diverse group of participants. The program has a goal of recruiting 1 million participants with complete electronic health records, survey data, and biospecimen data. This combination of a diverse study population and unique biospecimens allowed this study to happen and will serve as an important resource for future studies that require demographically diverse populations and biospecimens. This study supports the CDC and FDA recommendations for two sequential tests during a period of low prevalence (18). At the beginning of the pandemic, two sequential tests helped decrease the number of false-positive results. For example, for one test, 147 individuals tested positive by the Abbott assay, while only 9 (6%) of those individuals sequentially tested positive by the EI assay. Sequential testing helped reduce the number of false-positive results during a time of low prevalence, and the probability of the nine positive individuals being falsely positive was simulated to be 0.00001 across 1,000 replications of the simulation study (8). The overall concordance between Abbott and EI is very high (98.82%) and is driven by the number of negative samples. The research assay performed in this study demonstrates the power of adapting established methods to corroborate the seroprevalence of SARS-CoV-2 antibodies in participants early during the start of the pandemic with commercial assays. Careful interpretation of the results derived from a single assay is needed, and confirmation of positivity is advisable.
Interestingly, our results do not support the additional caveat specifying that the sequential tests have different anti-SARS-CoV-2 antigen targets (7). This could be important for future pandemics when rapid initiation of serological testing is needed before vaccines are available.
There are several notable strengths of this study. First, this cohort is incredibly diverse demographically and geographically, which helps with evaluating the validity and reliability of results generated from immunoassays for SARS-CoV-2 during a period of low coronavirus disease 2019 (COVID-19) prevalence. Also, sera were obtained prior to the known community spread of the pandemic within the United States, allowing early detection prior to the availability of commercial tests. In addition, as part of the methodology of this study, batches of samples were sent in a blind manner to Quest Diagnostics approximately every 2 weeks, with positive and negative controls embedded in the plates, which ensured the reproducibility of the results and allowed the verification of the publicly reported sensitivity and specificity of the platforms.
It is also important to consider the limitations of this study. First, the study was done in a time of low prevalence and prior to vaccine availability, so it does not allow generalizations to the current situation with high prevalence; the availability of current antibody prophylaxis and therapies, including monoclonal antibodies and convalescent-phase plasma; and exposure to variants of concern. Also, the study had diverse participants, but there was limited demographic information on the positive controls. This study also demonstrates the benefits and limitations of cohort studies. While useful in representing the population demographically, data are not collected in real time and cannot replace public health surveillance studies during a pandemic. Although the assays tested for the same antigen, they had different methods of antigenic production and characteristics. Also, the assays had different thresholds for positivity, so discordant results between assays may have been due to these arbitrary cutoffs. Finally, it is possible that there was cross-reactivity from previous infections with coronaviruses.
In conclusion, the CDC guidelines recommending the sequential testing of samples during a period of low prevalence are valid. However, in a future pandemic, testing may not require the use of different viral proteins because the false positivity varied as much between assays targeting the same antigen as it did between assays targeting different antigens. In addition, the All of Us Research Program biorepository is an important asset for evaluations of important research questions that require biospecimens collected in real time in a demographically diverse population.

MATERIALS AND METHODS
Study population. The All of Us Research Program is an observational cohort study enrolling a diverse group of at least 1 million people in the United States (19). The collection of biospecimens was paused on 18 March 2020 due to the SARS-CoV-2 public health emergency. Our study population includes a subgroup of the All of Us study participants who provided a blood specimen during their All of Us study visit occurring from 2 January to 18 March 2020 (8).
Positive-control specimens. Positive-control specimens were obtained from patients who were previously confirmed by PCR to have SARS-CoV-2 infection from the Vanderbilt University Medical Center (VUMC), Nashville, TN (n = 44); Brigham and Women's Hospital (PPM), Boston, MA (n = 18); and the Mayo Clinic (Mayo), Rochester, MN (n = 45), which were collected in the spring of 2020. The presence of IgG against the receptor binding domain (RBD) of the SARS-CoV-2 spike protein was confirmed via a liquidbead array quantification assay (20), with RBD IgG levels being quantified as units per milliliter by normalization to a standard curve using a human monoclonal antibody targeting the RBD. Positive-control samples from Brigham and Women's Hospital collected from SARS-CoV-2-positive inpatients were also positive by two assays, the Elecsys anti-SARS-CoV-2 immunoassay (Roche Diagnostics, Indianapolis, IN, USA), intended for the qualitative detection of antibodies against the NC antigen, and EDI New Coronavirus COVID-19 enzyme-linked immunosorbent assays (ELISAs) (Epitope Diagnostics, USA), which detect IgG against the NC antigen.
The positive-control specimens were sent to the All of Us biobank at the Mayo Clinic, where they were aliquoted into multiple specimens of 400 mL of serum for a total of 320 positive-control specimens (up to 8 specimens per positive-control individual). One positive-control specimen was included on each plate that underwent testing by Abbott and EI. A subset of 10 positive samples were run alongside All of Us participant samples on the Ortho and Roche assays.
Negative-control specimens. To ensure a sufficient sample size for specificity estimates, the negative controls were oversampled compared to the positive controls due to the low prevalence of SARS-CoV-2 infection during the study period. Negative-control specimens were randomly selected from All of Us participants who completed study visits in the same states between January and March 2019 (collected at least 8 months prior to the December 2019 detection of SARS-CoV-2 in Wuhan, China). Serum was separated according to the All of Us study protocol (19). Control samples from 1,000 negative individuals were used from the All of Us biobank at the Mayo Clinic, where they were aliquoted into duplicates of 400 mL of serum, for a total of 1,338 negative-control specimens (up to 2 specimens per negative-control individual). One negative-control specimen was included on each plate that underwent testing by Abbott and EI. A subset of 180 negative controls were run alongside samples from All of Us participants on the Ortho and Roche assays.
Protection of privacy. This study was approved by the All of Us institutional review board (IRB) committee. An exception was granted to the All of Us program's data and statistics dissemination (DSD) policy to report individual test results (21).
Abbott and EI testing. The Abbott and EI assays were performed on batches of approximately 5,000 specimens. The specimens from All of Us participants were sent with the positive and negative controls from the All of Us biorepository to Quest Laboratories (Quest), a Clinical Laboratory Improvement Amendments (CLIA)-certified testing environment. Quest was blind to the presence of positive-and negative-control specimens and conducted the testing of samples in a blind fashion. Quest created duplicate plates of 100-mL and 200-mL serum aliquots of every eligible All of Us participant to allow simultaneous testing by Abbott and EI.
Roche and Ortho testing. Specimens from All of Us participants with (1) discordant results, (2) those with concordant positive results, (3) and a random sample of specimens with concordant negative results by Abbott and EI were subsequently tested using the Roche Elecsys anti-SARS-CoV-2 (Roche) (targeting the NC protein) and Ortho-Clinical Diagnostics Vitros anti-SARS-CoV-2 IgG (Ortho) (targeting the spike protein) assays at the Mayo Clinic Laboratories, which is a CLIA-certified laboratory. Mayo created duplicate plates of 100-mL and 200-mL serum aliquots to allow simultaneous testing by Roche and Ortho.
SARS-CoV-2 IgG and IgM quantification by research assays. Specimens from All of Us participants who had at least two positive results across the four commercial assays (Abbott, EI, Roche, or Ortho) were further tested to quantify anti-SARS-CoV-2 IgG NC and spike protein concentrations at the National Cancer Institute to provide additional evidence of the presence of SARS-CoV-2 IgG antibodies (22). The following antibody titer cutoffs were used to signal the presence of antibody concentrations: a spike IgG titer of $10.4 binding antibody units (BAU)/mL and an NC IgG titer of $7.8 BAU/mL. Statistical methods. The percent agreement and 95% exact binomial confidence intervals (CIs) were estimated for the Abbott and EI comparisons across the total All of Us study population.
Of the specimens that were further tested by the Roche and Ortho assays, the percentages of agreement and 95% CIs for all pairs of immunoassays were estimated using a weighted approach to allow inference of results to the total All of Us study population. The criteria for selecting the subgroup were intended to maximize the information returned about disagreement (specimens with discordant results by Abbott and EI had a 100% probability of being included, concordant negative specimens had a ,1% probability of being included, and concordant positive specimens had a 100% probability of being included) (23). A probability-weighted method was used to incorporate these selection probabilities to estimate the percentage of agreement and the McNemar test P value for the total All of Us study population (23). The Svyciprop function in the R 4.1.2 survey package was used to construct the 95% CIs for weighted percentages of agreement with finite population correction, and nonparametric bootstrap sampling was used to calculate the P value for the weighted McNemar test.
For the specimens that were positive by at least two of the four commercial assays and were further SARS-CoV-2 Antibody Results during Low Prevalence mSphere tested with the research assays to quantify SARS-CoV-2 IgG antibodies, we present the quantification and their commercial assay results for all individuals. We excluded samples with missing results, which occurred due to missing identifications in the manifest and insufficient sample volumes (n = 22). Some All of Us participants had repeated analysis values, and only the first provided value was used.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only.