Comparability of Control and Comparison Groups in Studies Assessing Long COVID

Background Awareness of long coronavirus disease (COVID) began primarily through media and social media sources, which eventually led to the development of various definitions based on methodologies of varying quality. We sought to characterize comparison groups in long COVID studies and evaluate comparability of the different groups. Methods We searched Embase, Web of Science, and PubMed for original research articles published in high-impact journals. We included studies on human patients with long COVID outcomes, and we abstracted study-related characteristics, as well as long COVID characteristics. Results Of the 83 studies, 3 were randomized controlled trials testing interventions for long COVID, and 80 (96.4%) were observational studies. Among the 80 observational studies, 76 (95%) were trying to understand the incidence, prevalence, and risk factors for long COVID, 2 (2.5%) examined prevention strategies, and 2 (2.5%) examined treatment strategies. Among those 80 studies, 45 (56.2%) utilized a control or comparison group and 35 (43.8%) did not. Compared with 95% of observational studies that documented symptoms or assessed risk factors, all randomized studies assessed treatment strategies. We found 48.8% of observational studies did any adjustment for covariates, including demographics or health status. Of those that did adjust for covariates, 15 (38.5%) adjusted for 4 or fewer variables. We found that 26.5% of all studies and 45.8% of studies with a control/comparator group matched participants on at least 1 variable. Conclusion Long COVID studies in high-impact journals primarily examine symptoms and risk factors of long COVID; often lack an adequate comparison group and often do not control for potential confounders. Our results suggest that standardized definitions for long COVID, which are often based on data from uncontrolled and potentially biased studies, should be reviewed to ensure that they are based on objective data.


INTRODUCTION
Since the emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), there have been reports of people with persistent or long-term sequelae from infection, including fatigue, shortness of breath, and cognitive impairment. These reports were initially shared on social media, and writers in major media sources began to dub the condition "long coronavirus disease (COVID)" or describe those suffering as "long-haulers." 1,2 Because the origins of long COVID came through social media and not the biomedical literature, some have suggested that this is an illness that doctors only became aware of through empowered patients. 1 Numerous investigations have probed the symptoms and natural history of those with persisting symptoms post-COVID, culminating in the development of several definitions of long COVID. 3,4 However, these definitions were formulated by pooling study designs and methodologies of varying quality. This may lead to misclassification and bias in symptoms and prevalence estimates.
The objective of our study was to characterize control and comparison groups and evaluate comparability of the different groups in studies evaluating long COVID or postacute COVID syndrome. Because we were interested in studies with the greatest impact and most rigorous methodology, we chose to focus on studies published in high-impact journals.

Search Strategy
We searched Web of Science and Embase for the term "long COVID" and restricted to "articles" in English. We conducted a separate search on PubMed, using the terms "long COVID" or "post acute COVID syndrome," and limited the search to clinical and observational studies. Our search date was October 13, 2022. We included any original research study design that included human patients with long COVID or had a persistent or long-term COVID-related outcome. We excluded animal studies, attitudes about COVID, case reports/case series with <10 people (we included case series of more than 10), basic science/cell, device/algorithm, modeling/ simulation, reviews, research letters, studies not examining the condition of long COVID or a health outcome, protocols, qualitative reporting, social media, and test validation. We then excluded studies that were published in journals with an impact factor of less than 10, as per the most recent impact factor published on the journal's website.

Data Abstraction
For each included study, we abstracted data on the study design, categorized as cross sectional, including surveys, retrospective observational, case control, prospective observational, including pre-/post studies, and randomized controlled trials; study population, categorized as outpatient, inpatient, both inpatient and outpatient, community/general public, health care workers, national database, social media participants, or not indicated; general age of study participants, categorized as adult, child, both, or not indicated; number of people in the study; number in each arm/group (if multiple); whether there was a control arm or comparison group; whether vaccination was taken into account; the dates of enrollment or dates of data inclusion; definition of long COVID (symptoms, duration of symptoms, confirmed/ unconfirmed COVID diagnosis); prevalence of long COVID; country of study population; information on intervention/exposure and control/unexposed groups; data on matching and/or adjustment for covariates; whether there were sensitivity analyses to test for negative or positive control outcomes or exposures; funding information; and conflict of interest disclosures.
To examine the citations and media attention of the articles, we obtained the altmetric score for each article, using the Altmetric it! extension.

Statistical Analysis
We calculated descriptive statistics for the overall analytic sample and for the sample, stratified by whether the study used a control or comparator group. We calculated categorical differences with x 2 tests and continuous variables with Wilcoxon-rank sum tests. We conducted all analyses using R software, version 3.6.2, and Microsoft Excel. In accordance with 45 CFR x46.102(f), this study was not submitted for institutional review board approval because it involved publicly available data and did not involve individual patient data.

RESULTS
Our search identified 927 results in Embase, 882 in Web of Science, and 61 on PubMed ( Figure 1). After removing duplicate studies (n = 932), studies not meeting inclusion criteria (n = 452), and studies published in journals with impact factors of less than 10 (n = 407), 83 articles were included in the analytic sample. Study characteristics are detailed in Supplementary Table 1 (available online), stratified by control or comparison group status.
The median journal impact factor for included studies was 18 (interquartile range [IQR]: 12, 21). The median number of participants per study was 599 (IQR: 150, 7584), and the median age was 49 (IQR: 45, 56; n=71 studies). The year that studies most commonly completed enrollment was in 2020 (65.1%; n = 54). Most studies were conducted in a European country (68.7%; n = 57). Most studies were funded by a nonindustry source (69.9%; n = 58), and most authors reported either payments from industry (47.0%; n = 39) or no conflict of interest (45.8%; n = 38). Frequencies of studies published by year, study design type, and including a comparator group are shown in Figure 2.
Of the 83 studies, 48 (57.8%) used a control or comparison group, and 35 (42.2%) did not. Compared to studies without a control/comparator group, studies with a control/ comparator group were more likely to use a retrospective observational study design (25.0% vs 5.7%) and less likely to use a prospective cohort (45.8% vs 68.6%; P = .007 for global differences). Compared to studies without a control/ comparator group, studies with a control/comparator group were more likely to use a population that included outpatients (47.9% vs 28.6%) and less likely to use a population from the general community (8.3% vs 22.9%) or inpatients (12.5% vs 31.4%; P = .04 for global differences).

CLINICAL SIGNIFICANCE
In high-impact journals, 48 (57.8%) studies used a comparison group and 35 (42.2%) did not. About 50% of studies did any adjustment for confounders, and of those that did adjust for covariates, 42.8% adjusted for four or fewer variables. Our results have implications for standardized definitions, which are often based on data from uncontrolled and potentially biased studies.
Supplementary Table 2 (available online), lists COVID definition characteristics and control/ comparator group characteristics, by comparator group status and for the 3 randomized studies. Compared to studies without a control/comparator group, studies with a control/comparator group were less likely to have a nonspecific list of long COVID symptoms (41.7% vs 68.6%; P = .03) and a lower median number of possible symptoms (18 vs 21; P = .05). The most common COVID symptom duration for studies without a control/comparator group was not indicated (34.3%), whereas it was 3-4 weeks (35.4%) for studies with a control/comparator group.
Most studies did not account for vaccinations in their analysis, but most studies were conducted before COVID vaccines were available (63.9%; n = 53), while 9.6% (n = 8) were completed postvaccine and did not account for vaccinations in their analysis. About 50% (n = 42) of studies did any adjustment for covariates (including demographics or health status), and of those that did adjust for covariates, 42.8% (n = 18/42) adjusted for 4 or fewer variables. Of all studies, 26.5% and 45.8% of studies (n = 22) with a control/comparator group matched participants on at least 1 variable. Age (n = 13) and sex (n = 12) were the most commonly reported variables used for matching.
Of studies that did have a control/comparator group (n = 48), 8.3% (n = 4) used a negative control outcome sensitivity analysis; 6.2% (n = 3) used a positive control outcome sensitivity analysis; and 2.1% (n = 1) used negative exposure outcome sensitivity analysis. Of studies with a control/comparator group, 47.9% (n = 23) reported a comparison of baseline demographic and health status variables between groups, while 20.8% (n = 10) compared baseline demographics with limited health outcomes, 12.5% (n = 6) compared baseline demographics, and 18.8% (n = 9) presented no baseline comparisons between groups. The median prevalence of long COVID was 48 (IQR: 36, 66). When stratifying by the inclusion of a comparison group, the prevalence of long COVID among those infected with COVID was 51 (IQR: 42, 68) for studies without a comparison group and 46 (IQR: 14, 65) for studies without a comparison group. The prevalence of long COVID symptoms in comparator group participants was 34 (IQR: 7, 38).
The 3 randomized studies in the analytic sample examined breathing interventions to treat breathlessness in people with COVID (n = 2) and a monoclonal antibody to treat people with long COVID (n = 1), whereas nonrandomized studies were primarily focused on documenting symptoms (n = 52 out of 80; 65.0%) or risk factors (n = 24 out of 80; 30.0%). Two nonrandomized studies examined prevention of long COVID, and 2 studies examined intervention exposures aimed at preventing long COVID symptoms.

DISCUSSION
We found that in studies reporting on long COVID in highimpact journals, about 40% of studies assessing long COVID did not include a control or comparison group. We also found that about half of these studies do not adjust for variables that may also be associated with long COVID symptoms, such as comorbidities and age, and when studies do adjust for such covariates, most adjust for only a few variables. Further, only a small handful of these studies (<5%) perform sensitivity analyses. Finally, we found, as others have, that the diagnosis, list of symptoms, and duration of symptoms of long COVID vary. 5 In assessing the prevalence and natural history of long COVID, including a control or comparison group is an important step in ensuring that symptoms of long COVID are not a result of some other personal, social, or environmental characteristic. These characteristics could include aging, health status, or policy implementation, which are unrelated to the specific COVID infection but may predispose an individual to being exposed to or acquiring severe acute respiratory syndrome coronavirus 2. The COVID pandemic has affected people's lives both through direct viral impact as well as disruption to social systems and routines. This combination has led to changes in behavior, such as physical activity, sleep, and in-person and virtual interactions, 6 and biologic changes, even among those not infected with COVID. 7 The definition of long COVID has evolved throughout the pandemic, prompting several large organizations to develop differing definitions with broad inclusion criteria. 3,4 Because many of the studies in our analytic sample were conducted prior to the publication of these definitions, at least some of these studies were used in the development of these well-recognized definitions. When definitions are developed, especially ones that will be used on a large-scale level, they should be based upon unbiased and objective data, and yet, we find that if studies comparable to the studies in our analytic cohort are used in the development of long COVID definitions, there is the potential for bias and misclassification in determining long COVID.
In our study, we included all study designs, and the best study design will depend on the research question being studied. For questions assessing efficacy of an intervention, such as vaccines and treatment strategies, a randomized study is essential to ensure comparability between study arms. Of the 83 studies in our analytic sample, 3 were randomized studies, testing interventions to treat long COVID or its symptoms.
Alternatively, studies that assess claims about COVID natural history and severity cannot be randomized. Therefore, it is important to include comparison groups, particularly those that are analogous to the exposure group. Most of the nonrandomized studies in our analytic sample (95%) examined symptoms or risk factors for long COVID. However, a limitation exists because these studies are not randomized and, therefore, require further steps to minimize confounding between exposure groups. For example, a recent prospective observational study examined physical, mental, and social well-being outcomes between patients with COVID-19 and other upper respiratory infections. 8 This particular study included a control group, consisting of those with upper respiratory infections, which was more comparable than healthy people to those with COVID-19 infections. By doing this, researchers were better able to determine whether lingering symptoms were attributed to COVID-19 specifically or whether those symptoms can be attributed to any respiratory infections.
In contrast, a recently published economic analysis did not provide a comparison of costs between COVID and other respiratory tract infections. Their analysis estimated the societal cost of long COVID to be valued at $3.7 trillion, yet it did not provide a comparison of costs related to other respiratory tract infections. 9

Strengths and Limitations
A strength of this study is that this is the first broad analysis of literature that seeks to examine comparison measures in long COVID studies. Second, we focused on high-impact publications, which have played an inordinate role in steering the public conversation. The median altmetric score of articles we examined was 173. Moreover, these journals are well known for strong peer-review process with accomplished reviewers and editors.
We recognize that there are several limitations to our analysis. First, the studies we included in this analysis cannot be generalized to others as we used publication in highimpact journals as an inclusion criterion. Because we used studies from high-impact journals, it is likely that our analytic sample includes studies with even higher study quality and less bias than studies published in lower impact factor journals. Second, we relied on information published in the studies, so some variables in the studies may have been misclassified because of lack of reporting due to word limits or oversight. Third, because of a limited number of studies, we were not able to examine the interaction of different variables, such as year, study type, and study population. Fourth, because there were only 3 randomized studies, we were unable to do an in-depth comparison of differences in study methodologies and outcomes between randomized and nonrandomized studies.

CONCLUSION
In conclusion, we found that more than 40% of long COVID studies published in high-impact journals lack a comparator group and that many studies do not adjust for (49%) or match on (74%) confounding variables. Standardized definitions for long COVID, which are often based on uncontrolled data and potentially biased studies, should be reviewed to ensure that they are based on objective data.

5.e1
The American Journal of Medicine, Vol 000, No 000, && 2023 Supplementary  yThere were 33 observations used in determining long COVID prevalence in the exposure group and 8 observations used in determining long COVID prevalence in the comparator group.