How valid is using cancer registries’ data to identify acquired immunodeficiency syndrome-related non-Hodgkin’s lymphoma?

We sought to determine the accuracy of cancer registry data regarding the human immunodeficiency virus (HIV)/acquired immunodeficiency syndrome (AIDS) status of patients with non-Hodgkin’s lymphoma (NHL). We used the population-based San Diego/Orange County cancer registry to identify 392 patients with HIV-related NHL diagnosed 1994–1999. After matching for age, sex, race, period of NHL diagnosis, and hospital type, we were able to find 324 corresponding patients among the remaining 4,863 NHL patients diagnosed 1994–1999 (who did not have HIV infection according to cancer registry records). We sought to review these patients’ charts at 41 hospitals with 15 separate institutional review boards to determine if the HIV serostatus from the cancer registry was correct. We performed a forward conditional multivariate logistic regression to determine characteristics associated with a false positive HIV status. The false positive rate was 8% while the false negative rate was 3%. The positive predictive value was 93% while the negative predictive value was 97%. Compared to correctly identified patients, false positives were more likely to be ≥50 years old, female, and treated with chemotherapy and less likely to be single with high grade or extranodal disease. Using cancer registry data to identify AIDS-related NHL is a valid research practice.

Non-Hodgkin's lymphoma (NHL) recently has been a prominent cause of morbidity and mortality among people with acquired immunodeficiency syndrome (AIDS) because highly active antiretroviral therapy (HAART) and opportunistic infection prophylaxis have decreased other serious illnesses among individuals with human immunodeficiency virus (HIV) infection [1].Population-based cancer registries can contribute to research on NHL among HIV patients if the registries can reliably identify the HIV serostatus of NHL patients.Since 1994, the population-based California Cancer Registry has required that HIV serostatus be included in NHL incidence reports [2,3].To our knowledge, there are no prior studies which validate the accuracy of the cancer registry HIV indicator by verifying the HIV serostatus of NHL patients through medical record review.
In the course of conducting a case control study comparing NHL in AIDS patients versus NHL in patients without HIV/AIDS, we reviewed medical records of NHL patients with and without HIV infection.This provided an opportunity to evaluate the accuracy of the HIV indicator flag in the California Cancer Registry regions of Orange and San Diego Counties.We sought to determine the positive and negative predictive values of the cancer-registry HIV indicator against the 'gold standard' of the medical chart.

Methods
This report describes the methods used in identifying subjects for a case control study that has been published elsewhere [4].Prior to study initiation, we obtained institutional review board (IRB) approval from the University of California Irvine.In October 2001, we identified all incident diagnoses of NHL (N = 5,255) among residents of Orange and San Diego Counties between January 1994 and December 1999 from the Orange and San Diego population-based cancer registries (Cancer Surveillance Program of Orange County and San Diego/Imperial Organization for Cancer Control, regions 10 and 7, respectively, of the California Cancer Registry).We defined evidence of NHL as the National Cancer Institute Surveillance Epidemiology and End Results (SEER) site recode groups 33041-2 including International Classification of Diseases for Oncology Second Edition (ICD-O-2) histology codes 9590-9595, 9670-9677, 9680-9688, 9690-9698, 9700-9717, 9823, 9827 [5,6].We defined high histologic grade according to ICD-O-2 Working Formulation standards [6].
The California Cancer Registry is required to collect HIV serostatus data for patients with Kaposi's sarcoma, Hodgkin's lymphoma, and NHL as an extent of disease code according to SEER coding guidelines [7].SEER registries include the states of California, Connecticut, Hawaii, Iowa, Kentucky, Louisiana, New Jersey, New Mexico and Utah and the areas of Detroit, Atlanta, Seattle-Puget Sound as well as the registries for rural Georgia and Alaskan and Arizonan American Indian/ Alaska natives [8].The cancer registrar codes the HIV variable as positive, negative or unknown based on physician or laboratory documentation of HIV serostatus.This HIV indicator variable reflects the HIV serostatus at the time of NHL diagnosis only; there is no reporting requirement for follow up for a later HIV/ AIDS diagnosis.According to the registry HIV indicator variable, 392 of these patients were HIV-infected at the time of NHL diagnosis.We intended these patients to be the cases for the planned case control study [4].
We selected corresponding controls from the remaining 4,863 NHL patients, who were not known to be HIV-infected according to cancer registry records.We matched on age, sex, race, period of NHL diagnosis, and hospital type (Fig. 1).We defined the periods of NHL diagnosis as 1994-1995 vs. 1996-1999 because HAART became widely available in 1996 [1].We defined the hospital types as small community, academic/Veteran's Administration (VA)/military and large community/health maintenance organization (HMO) [9].We matched on hospital type in order to make the cases and controls more similar regarding health care access and to limit the number of hospitals for chart abstraction.We did not specifically seek outpatient medical records from freestanding private physician offices.We were able to identify 324 controls (49% coded as HIV seronegative and 51% coded as HIV serostatus unknown), but did not find matching controls for 68 additional NHL patients flagged as HIV-infected because patients with AIDS-related NHL were more likely than patients with NHL without AIDS to be African-American or Latino, aged 25-39 years and diagnosed in 1994-1995 at an academic hospital.
Using a standardized data form, trained medical record abstractors reviewed charts for evidence of HIV infection.Evidence of HIV infection included any mention of HIV/AIDS, AIDS-related opportunistic infections, positive laboratory testing such as HIV ELISA with confirmatory western blot or detectable HIV viral load, or antiretroviral therapy.Using the medical record as the 'gold standard', we computed the positive and negative predictive values of the cancerregistry indication of HIV serostatus at NHL diagnosis.We calculated confidence intervals on proportions by finite-sample methods [10].We used Pearson's chisquare test to compare proportions unless an expected cell value was less than five, when we used Fisher's exact test.In order to determine the characteristics of NHL patients falsely identified as having AIDS by the cancer registry, we performed univariate and multivariate analyses.We split variables of interest into categories as shown in Table 1.We then performed a forward stepwise conditional multivariate logistic regression to determine which variables were significantly associated (P < 0.05) with the false positives among the patients flagged as HIV-infected by the cancer registry.

Results
We sought to review the medical records of 392 patients flagged as HIV-infected by the cancer registry and 160 flagged as HIV-negative and 164 with HIV serostatus unknown from a total of 41 hospitals (Fig. 1).However, 45 charts (19 flagged as HIV-infected, 10 negative, and 16 unknown) were unavailable because they were missing or shredded or relocated out of southern California.We sought separate IRB approval as required by 15 institutions but one local IRB did not permit the review of 14 charts (eight flagged as HIV-infected, one negative, and five unknown).While seven of the 41 hospitals had closed, we often were able to retrieve their records from storage.Thus, we reviewed the charts of 365 patients flagged as HIVinfected, 149 flagged as HIV-negative, and 143 flagged with unknown serostatus at 33 hospitals.These patients with medical charts available for review comprised 93% of the initial 392 patients flagged as HIV-infected and 160 flagged as HIV-negative and 87% of the initial 164 with unknown serostatus.Compared with missing charts, available charts were more likely to be from hospitals located in San Diego rather than Orange County and were more likely to be from deceased patients who had received chemotherapy (P < 0.05).
Among the 657 patients with charts available for review, six (1%) (two flagged as HIV-infected, one negative, and three unknown) were wrongly classified as NHL diagnoses.Two had Hodgkin's lymphoma and one had no evidence of clinical lymphoma after an ambiguous radiology report, and one each had post transplant lymphoproliferative disorder, Castleman's disease, and reactive lymph nodes.The median follow up time was ten months (range: 0-115) for the 651 patients with verified NHL with charts available for review.
Among the 363 NHL patients flagged as HIV-infected by the registry, 4 had negative HIV serologic testing and 21 had no evidence of HIV infection in the medical record resulting in 25 false positives.The false positive rate was 8% and the positive predictive value 93% (95% confidence interval (CI) 90-95%) (Table 2).We also checked death certificate diagnoses for evidence of HIV/AIDS for the deceased false positives (N = 14).Since we had previously performed an AIDS-cancer registry match for San Diego County using cancer registry data dating from 1988 to 2000 and AIDS registry data from 1981 through July 2003, we reviewed these data to further verify the negative serostatus of the patients from San Diego that we identified as false positives (N = 17) [11].
Among the 288 NHL patients without a positive flag for HIV infection, registrars coded 148 (51%) with negative and 140 (49%) with ''unknown'' HIV serostatus.Among the 148 coded as negative, 82 (55%) had a 11 true positives, 5 true negatives coded as negative and 3 true negative coded as unknown lack marital status information b 10 true positives, 3 false positives, 8 true negatives coded as negative and 12 true negatives coded as unknown lack data regarding stage c 4 true positives, 2 false positives, 4 true negatives coded as negative and 5 true negatives coded as unknown lack chemotherapy data no documented HIV testing or physician statement of HIV serostatus and 66 (45%) had charted negative HIV serologic testing.Among the 140 coded as ''unknown'', 81 (58%) had no documented HIV testing or physician statement of HIV serostatus, 50 (36%) had charted negative HIV tests and nine (6%) actually had HIV/AIDS.The resulting false negative rate was 3% and the negative predictive value 97% (95%CI 94-99%) (Table 2).We reviewed the HIV diagnosis dates of the nine false negatives and none of the HIV diagnoses were after the NHL diagnoses.Among the nine false negatives, four had HIV/AIDS and two had Kaposi's sarcoma (an AIDS-related malignancy) reported in the text of their registry abstract.The remaining three had no mention of HIV/AIDS or associated conditions in the registry abstract but had HIV/AIDS according to the complete medical record.The nine false negatives were more likely than patients coded as HIV-negative or HIV serostatus unknown, to be single with high grade stage IV NHL and less likely to receive chemotherapy or be alive at most recent follow up (P < 0.05).
In univariate analysis (Table 3), false positives were more likely than correctly identified patients to be ‡50 years of age, female, treated with chemotherapy, and alive at most recent follow up and less likely to be single, have extranodal disease, or be diagnosed at an academic, VA, or military hospital.In multivariate analysis, false positives were more likely to be ‡50 years old, female, and treated with chemotherapy and less likely to be single with high grade or extranodal disease.

Discussion
Since HAART and opportunistic infection prophylaxis have decreased other causes of death among HIV-infected individuals, NHL has become a prominent cause of morbidity and mortality among people with AIDS [1,12].The cancer registry was able to identify AIDS-related NHL with a positive predictive value of 93% and negative predictive value of 97%.The false positive rate was 8% and the false negative rate was 3%.Since chart reviews are difficult to facilitate in the current health care system, it is fortunate that the cancer registry is usually accurate in identifying AIDSrelated NHL.
Other sources of data for AIDS-related NHL research include AIDS registries, AIDS-cancer registry matches, case series, and death certificates.While AIDS registries collect NHL diagnoses as an initial AIDS-defining illness, they do not necessarily collect data regarding NHL diagnoses which occur after the initial AIDS defining illness.A computerized matching algorithm can match NHL patients from the cancer registry to AIDS patients from the AIDS registry and thus determine which NHL patients in the cancer registry have AIDS [11,13,14].Since such AIDScancer registry linkages are laborious, researchers perform matches episodically, so the data may not be current.While investigators may report case series of AIDS-NHL patients from their institution, such case series are not necessarily representative of the regional AIDS-related NHL population, since patients may go to other institutions for care [15].While researchers may use concurrent AIDS and NHL diagnoses listed on the death certificates to track patients with AIDS and NHL [16], death certificate diagnoses are often unreliable and available only for the deceased.Population-based cancer registry data may provide data to study the epidemiology of AIDS-related NHL that is more complete, current, detailed, and representative than other sources [17].Furthermore, these data may be useful for monitoring health care outcomes over time.Lastly, cancer registry data may provide a source of living patients for clinical studies.
False positives had characteristics of NHL patients without AIDS.False positives were more likely to be ‡50 years old, female, and married in contrast with AIDS patients who were predominantly less than 123 50 years old, male, and single.False positives also were more likely than correctly-identified patients to have received chemotherapy.However, lack of cancer treatment among patients with AIDS-related NHL may reflect the poor performance status and intolerance to chemotherapy of AIDS patients in the pre-HAART era and might not be present in a contemporary analysis.False positives were less likely than correctly-identified patients to have extranodal or high grade disease, reflecting the higher prevalence of extranodal disease and aggressive histology among patients with AIDS-related NHL as compared to NHL patients without AIDS [14].Overall, the misclassification pattern suggests the etiology of the false positives is likely human error such as lack of sufficient training or attention among cancer registrars.
In contrast to the false positives, the false negatives had characteristics of the AIDS population and were more likely to be single and deceased with high grade stage IV untreated NHL.We sometimes could confirm that a patient was a false negative by checking for a concurrent of Kaposi's sarcoma or perusing  the abstract text for a mention of HIV/AIDS.Since all nine false negatives were classified by cancer registrars as unknowns, we initially thought the etiology of false negatives was a bias toward coding unknown to protect patient confidentiality regarding AIDS.For instance, as of 2002, the VA specifically requires its registrars to code HIV-infected patients to unknown serostatus and not mention any AIDS diagnosis in the reporting text [18].However, since four of the nine false negatives did have their AIDS status noted in the text portion of the abstract, both human error and confidentiality concerns may be the source of miscoding.
The merits of our study include its population-based nature and the completeness of our chart reviews.We were able to abstract over 90% of the charts of all patients with AIDS-related NHL reported to the Orange/San Diego County cancer registry from 1994-1999.In any registry study, lack of complete reporting to the registry is a potential limitation (although previous studies have estimated cancer registries as approximately 75% complete for AIDSrelated NHL) [13,17].While theoretically some false positives could have been diagnosed with HIV after the study completion, that scenario is highly unlikely given that none of the patients had an HIV diagnosis at the time of NHL diagnosis and we reviewed all available follow-up chart data.
Although we did not specifically seek outpatient records, patients who received outpatient care prior to initial hospitalization were either treated or referred by their outpatient physician and copies of their outpatient information frequently appeared in the hospital chart.Since many hospitals had affiliated outpatient clinics, these outpatient records were available with the hospital chart.NHL usually requires a tissue diagnosis and treatment with chemotherapy which may require hospitalization regardless of HIV serostatus.In addition, patients with AIDS-related NHL typically undergo hospitalization for associated complications.Overall, the lack of outpatient medical record review from freestanding private medical offices is a minor shortcoming of our study.
Our findings would not apply to registries which are not population-based and registries that do not follow the standards of the California Cancer Registry and SEER [2,3,5,7,8].Also our findings might not apply to other areas (in California and nationally) with a much higher or lower AIDS prevalence.Because the incidence of AIDS-related NHL (especially extranodal primary central nervous system NHL) currently is decreasing [11], the positive predictive value may also decrease in the future.By matching on age, sex, race, time period of NHL diagnosis and hospital type, we decreased the ability to discern a difference between patients with AIDS-related NHL and NHL without AIDS regarding these characteristics.We included 68 unmatched AIDS patients in the multivariate analysis and these patients differed from the matched pairs in four of the matching criteria (age, race, time period and hospital type).Since our subjects were demographically matched as part of a case control study, the false negative rate in our study may be higher than the actual rate for the general NHL population, which tends to be older with more women than in our subjects [19,20].The number of false negatives was too small to calculate odds ratios.We were unable to review the records of approximately 10% of patients and those charts were from patients who were more likely be living and less likely to have received chemotherapy.Lastly, since we did not review the medical records of most NHL patients reported to the cancer registry as uninfected, there undoubtedly are patients with AIDS-related NHL who are not included in our study.
There are multiple barriers to medical record abstraction studies originating from cancer registries.There is a shortage of trained medical record abstractors.It is expensive to pay abstractors and the review of medical records is time consuming.In our study, many patients had multiple charts at each hospital (e.g.inpatient and outpatient, including radiation oncology, oncology, and HIV clinics) as well as charts at multiple hospitals.However, we found these issues were more easily addressed than the administrative hurdles evoked by our study.We sought IRB approval from 15 institutions in order to perform our study, which was costly and time-consuming for us and each individual institution's IRB [21,22].In addition, several institutions required cancer committee or medical executive committee approval in addition to or in lieu of IRB approval.The passage of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) also has resulted in decreased access to medical records for research [23,24].This loss of access is particularly unfortunate for registry related research where the population-based nature of the data is crucial [25,26].Currently, there is confusion regarding HIPAA requirements as evidenced by the conflict between California State and VA requirements regarding AIDS reporting to the cancer registry.We recommend governmental efforts to negate the negative effects of HIPAA on research and encourage reciprocity among IRBs.
Since the cancer registry is accurate in identifying AIDS-related NHL, cancer registry data may be suitable for tracking trends of AIDS-related NHL over time for epidemiologic purposes.Researchers should use caution in accepting the cancer registry's HIV coding of any particular patient with NHL.Improved registrar training and decreased workloads might decrease human error as a source of miscoding.We have identified characteristics (e.g.tumor grade and marital status) that could be useful in identifying potential false positive or negative patients that would require chart review.Given the changing incidence of AIDSrelated NHL and varying geographic prevalence of AIDS, future confirmation of our findings, particularly in diverse geographic areas, would be of interest.
Dr. Diamond received support from the National Cancer Institute (K07CA96480) and the California Collaborative Treatment Group funded by the Universitywide AIDS Research Program of the State of California (CH05-SD-607-005).

Table 1
Identification of Acquired Immunodeficiency Syndrome (AIDS)-Related non-Hodgkin's lymphoma (NHL), Beginning with all patients flagged as HIV-infected by the cancer registry for San Diego and Orange Counties, California, 1994-1999 Characteristics of patients with non-Hodgkin's lymphoma (NHL) from the Cancer Registry for San Diego and Orange Counties, 1994-1999: a comparison among patients correctly identified by the registry as having human immunode-ficiency virus (HIV) infection (True Positive), wrongly identified as having HIV (False Positive), correctly identified as not having HIV (True Negative coded by the Registry as Negative or Unknown) or wrongly identified as uninfected (False Negative)

Table 2
Positive and negative predictive value of cancer registry data in determining human immunodeficiency virus (HIV) serostatus of patients with non-Hodgkin's lymphoma, San Diego and Orange County,California, 1994California,  -1999 Pooling the negative and unknown values for the cancer registry flag and chart review verification, the positive predictive value of the cancer registry flag is 93% (90-95%) and the negative predictive value is 97% (94-99%).The false positive rate is 8% and the false negative rate is 3%.The sensitivity is 97% and specificity is 92% Cancer Causes Control (2007) 18:135-142 139

Table 3
A total of 363 Patients with Non-Hodgkin's Lymphoma Coded as Having Human Immunodeficiency Virus (HIV)/ Acquired Immunodeficiency Syndrome (AIDS) by the cancer registry for San Diego and Orange Counties, 1994-1999:Univariate and multivariate odds ratios (OR) and 95% Confidence intervals (CI) for Incorrect Identification of HIV/AIDS Status (N = 25) by the Registry