Validation of Family History Data in Cancer Family Registries

Background: Although family history information on cancer is used to infer risk of the disease in population-based, case–control, cohort, or family-based studies, little information is available on the accuracy of a proband’s report. In this study, we sought to determine the validity of the reporting of family history of cancer by probands in population-based and clinic-based family registries of breast, ovarian, and colorectal cancers. Methods: To assess the accuracy of probands’ reported family history of cancer in their relatives, we compared the family history from the personal interview of each proband to a reference standard that included pathology reports, self-reports, or death certiﬁcates on the relatives. Our study included 1111 families that accounted for 3222 relatives who were veriﬁed. To account for within-family correlations in the responses, we used a generalized estimating equation approach. Results: The probability of agreement between proband-reported cancer status in a reference standard by cancer by degree of relationship 83.3% 72.8–93.8) ovarian 89.7% 85.4–94.0) 79.3% (95% CI 70.0–88.6) Conclusions: We found high reliability of probands’ reporting on most cancer sites when they reported on ﬁrst-degree relatives and moderate reliability for their reporting on second- and third-degree relatives. Overreporting of cancer was rare (2.4%). Race or ethnicity and gender of the proband did not inﬂuence the accuracy of reporting. However, degree of relationship to the proband, type of cancer, age at diagnosis of the proband, and source of ascertainment of probands were statistically signiﬁcant predictors of accuracy of reporting.


Introduction
F amily history of cancer in first-degree relatives has been shown to be a risk factor associated with increased risk of developing cancer 1-3 and has been used to identify individuals for genetic and molecular studies. From these studies, estimates on the proportion of hereditary cancer because of susceptibility genes have been made. In addition, families with a history of cancer in multiple generations have been useful in cloning susceptibility genes. 4,5 Prediction models are usually based on explicit family histories of breast and ovarian cancers or summaries of the number of affected first-and second-degree relatives with breast or ovarian cancer. 6 -9 These models are used by physicians and genetic counselors interested in helping women to understand their risk of breast and ovarian cancers and the preventive options available to them.
In the present study, we sought to validate the reporting of family history of cancer by canceraffected probands in population-based and clinicbased family registries of breast, ovarian, and colorectal cancers. The objectives of the study were (1) to systematically evaluate the consistency of canceraffected proband-reported information on cancer in their first-, second-, and third-degree relatives; (2) to determine the positive and negative predictive values as well as the probabilities of agreement between the proband-reported cancer status in a relative with the reference standard for various cancer sites; and (3) to determine the effect of the characteristics of the proband and the characteristics of the proband's relatives on the probability of agreement between information given by the proband's interview and the reference standard (pathology, self-reporting, and death certificate).

Methods
This validation study was conducted in the course of the creation of the family registries of probands for breast, ovarian, and colorectal cancers at the University of California at Irvine (UCI), in Orange County, California. According to the U.S. Bureau of the Census, roughly 1% of Americans reside in Orange County, California (2.7 million, estimated as of July 1, 1999). Approximately 54% of Orange County residents are white, 2% African American, 30% Hispanic, and 14% Asian.
The cancer family registries used for this study included both population-based and clinic-based probands. 10,11 Population-based proband ascertainment included all breast cancer cases diagnosed in Orange County, California, during the 1-year period beginning March 1, 1994 (UCI IRB no. HS91-137); all ovarian cancer cases diagnosed during the 2-year period beginning March 1, 1994 (UCI IRB no. HS91-137); and a weighted sample of familial colorectal cancer cases diagnosed during the 2-year period beginning March 1, 1994 (UCI IRB no. 93*257). In particular, for colorectal cancer probands, the population sampling was done with a stratification to increase the proportion of patients who were familial or younger than 65 years and, at the same time, to maintain the population-based status. Clinic-based proband ascertainment included a number of families enrolled in the family registries as referrals from Orange County physicians. At least annually, the UCI cancer surveillance program contacts community, clinic, and hospital physicians to let them know about new and ongoing genetic epidemiology studies and to encourage them to refer their high-risk probands to enroll in these studies. The clinic-ascertained group is included in this study because of the importance of generating data that would be helpful in clinical settings. We do recognize, however, that the data from this group are generalizable only to subjects who have a strong family history of cancer and are considered to be at high cancer risk. The characteristics of this group were different from the population-based group, because this group was referred to us because of their cancer familiality. The same protocol was used for interviews and verification of cancer in relatives for clinic-based and population-based probands.
Data on approximately 1200 breast cancer, 300 ovarian cancer, and 1200 colorectal cancer families were available in the family registries. A description of the family registries has been given elsewhere. 10,11 All probands included in this study were cancer affected. In the current study, we included all families in which there was at least one relative, affected or unaffected, with a method of verification available. Our study included 670 breast cancer families, 123 ovarian cancer families, and 318 colorectal cancer families.
Relatives were classified as first-degree, which included parents, siblings, and children of the proband; seconddegree, which included grandparents, aunts, uncles, halfsiblings, nieces, and nephews; and third-degree, which included first cousins and grandchildren. A proband was considered to be "familial" with respect to a specified cancer if at least one first-degree relative (parent or sibling) in addition to the proband was diagnosed with the same cancer as the proband.

Data Collection
After initial contact with the proband's physician(s), a description of the study and an invitation to participate were mailed to the proband. This mailing was followed by a telephone interview to obtain family history information (including first-, second-, and third-degree relatives). Interviewers entered the family history data into the Genetics Registry Information System (GRIS) database through a set of computer screens that capture demographic information, tumor information, and protocol status for probands and family members. The system has functions to generate merge files for creating personalized letters and mailing labels. After the telephone interview, a verification report and pedigree were produced from GRIS that showed the family history information reported by the proband during the interview. These reports were mailed to the proband so that he or she could complete data items not known during the interview and verify all information collected. The verification table included dates of birth and death, types of cancer, dates of diagnoses, hospital at diagnosis, and other relevant information. Release-of-information forms also were sent to the proband for approval and signature in order to obtain medical records of those relatives who were deceased. After obtaining permission from the proband, the interviewers contacted the proband's living relatives who had cancer and asked them to sign their own release-of-information forms.
Family history verification is a dynamic process and can continue over a long period. Therefore, in the analysis presented here, we selected July 2000 as a cutoff date. Statistical analysis included all relatives for whom verification was obtained using the methods described in the Verification Methods section. The relatives included in these analyses were those reported by the probands to have had cancer or to have died, as well as unaffected siblings or cousins of the proband. All families for whom we completed at least a level of verification on at least one relative were included.

Verification Methods
Malignancies reported on the relatives of the proband were verified by one of the following methods: (1) obtaining pathology reports, tumor tissue samples, or clinical records, which we will refer to throughout this report as "pathology"; (2) obtaining "self-reports" from affected and nonaffected relatives of probands through a structured questionnaire and personal interviews; or (3) obtaining death certificates on deceased relatives. We obtained an authorization form from the proband or next of kin to obtain release of medical information (pathology reports and tumor tissue samples), to confirm malignancies, or to obtain death certificates. For a sample of living relatives who were reported by the probands to be unaffected, we obtained permission from the probands to contact these relatives to verify that they did not have cancer. For deceased unaffected relatives, we obtained death certificates and verified the probands' reporting. To assess the accuracy of reported family history of cancer in the relatives of probands, we compared the reported family history from the personal interview of the proband with a reference standard that included pathology reports or selfreports from the relative or death certificates on deceased Am J Prev Med 2003;24 (2) relatives. For relatives with more than one source of verification information of their cancer status, we chose pathology over both the self-report and death certificate. In some cases, self-reports were obtained before death; in these cases, selfreports were chosen over death certificates. We used the date the proband returned the verification table as a reference date. The pathology report, self-report, and death certificate were adjusted according to this date.

Statistical Analysis
We calculated the positive and negative predictive values (PPV and NPV) on the cancer status of relatives by using the family history data reported through a personal interview of the proband and parallel data on the same relatives by using a reference standard as described in the Introduction. We also determined (1) the probability of agreement between the proband-reported cancer in a relative given the cancer status of that relative, as determined by the reference standard (probability of agreement of cancer [PAC] in a relative), and (2) the probability of agreement between the probandreported absence of cancer in a relative given that the data from the reference standard indicated that this relative had no cancer (probability of agreement of no cancer [PANC] in a relative). For the above analyses, we used data collected from the proband during the personal interview, and we considered pathology, self-report, and death certificate as the reference standard. To account for the within-family correlations on the responses, we used a generalized estimating equation (GEE) approach with a log link and an exchangeable correlation structure. 12 This approach accounts for the variable number of relatives per proband, allows for the correlation among responses within families, and provides appropriate confidence intervals (CIs) of the estimates. To account for the sampling of our study, we used logistic regression models, 13 in which the predictor variable was the reporting of family history of cancer from the proband and the response was the reference standard. We calculated the results for PPV and NPV as well as the PAC and PANC and their 95% CIs for specific cancers, stratified by the degree of relationship to the proband.
We investigated the effect of characteristics of the proband and the relatives, such as age at diagnosis, race or ethnicity, gender, and type of cancer on the probability of disagreement. Because the false-negative rate was different from the false-positive rate, we used separate regression models for the false positives and the false negatives. To account for the within-family correlations on the misclassification rate, we used a GEE approach with a log link and an exchangeable correlation structure. For these models, we considered the response variable to be simple agreement between the proband's interview and the reference standard. For each combination of family history of cancer (breast, prostate, colorectal, lung, and all cancers combined), we fitted separate models on the response described for the false negatives. Similarly, we fitted a model of all cancers combined among the false positives. In all models, we included a number of predictors and used the deviate scores to determine the best fitting model.

Population Under Study
In the present study, we included families of 1111 probands; 670 (60.3%) were breast cancer probands, 123 (11.1%) were ovarian cancer probands, and 318 (28.6%) were colorectal cancer probands. Table 1 shows the characteristics of the probands with respect to race or ethnicity, ascertainment source, gender, age at diagnosis, and proband-reported family histories of breast, ovarian, and colorectal cancers. The majority of probands were non-Hispanic white (1022 [92.0%]); the primary source of ascertainment was population-based (1042 [93.8%]); and the gender was predominantly female (939 [84.5%]). The distribution of age at diagnosis of the probands with breast, ovarian, or colorectal cancer shows that 312 probands (28.0%) were diagnosed at an age younger than 50 years, 545 probands (49.1%) were diagnosed between the ages of 50 and 69 years, and 254 probands (22.9%) were diagnosed at age 70 years or older. The mean age at diagnosis for the probands was 56.6 years old (SDϭ13.3).

Family Size
The distribution of relatives by degree of relationship to the proband across the three groups of probands is shown in Table 2. On the average, there were 7.1 first-degree relatives, 19.4 second-degree relatives, and 14.3 third-degree relatives per family. On the average, we obtained verification on 1.8 first-degree relatives, 2.4 second-degree relatives, and 1.6 third-degree relatives per family, with a verification rate of 25.4%, 12.4%, and 10.7%, respectively. An average of 2.0 first-degree relatives, 2.6 second-degree relatives, and 2.2 third-degree relatives per family were diagnosed with cancer. We obtained verification on 1.4 first-degree relatives, 1.7 second-degree relatives, and 1.4 third-degree relatives per family diagnosed with cancer, with a verification rate of 70.4%, 65.1%, and 61.3%, respectively.

Verification Methods
In total we included 3222 relatives (1320 [41%] men and 1902 [59%] women) for whom we had a form of verification; of these, 1692 (52.5%) were first-degree relatives, 1214 (37.7%) were second-degree relatives, and 320 (9.9%) were third-degree relatives. We were able to obtain 474 pathology reports, 777 self-reports, and 2142 death certificates. We had multiple forms of verification from some relatives; specifically, we obtained both pathology reports and death certificates for 85 relatives, both pathology and self-reports for 75 relatives, and both self-reports and death certificates for 7 relatives.
Among the 85 relatives for whom we obtained pathology reports and death certificates, 52 (61.2%) were first-degree, 25 (29.4%) were second-degree, and eight (9.4%) were third-degree relatives of probands. Among the 85 relatives, in 68 (80.0%) pathology reports and death certificates were in complete agreement for each cancer site; in the remaining 17, there was disagreement between the pathology report and the death certificate. Among those 17 relatives, 12 (14.1%) had no cancer reported on the death certificate; for five relatives (5.9%), the metastatic cancer site rather than the primary site was reported on the death certificate. Among the 75 relatives for whom we obtained both a pathology report and self-report, 62 (82.7%) were first-degree, seven (9.3%) were second-degree, and six (8.0%) were third-degree relatives. In this group, 69 (92.0%) had complete agreement between the self-report and pathology for each cancer site.
In Table 3, first-degree relatives are characterized as positive or negative for their own history of a number of different cancer sites, as reported by the proband in the interview, compared with the reference standard. Table  3 also shows estimates of PPV and NPV as well as PAC  and PANC with corresponding 95% CI. The PPV and PAC measurements were higher than 75%, with many of them higher than 90% for breast, ovary, prostate, colorectal, pancreas, and lung cancers, as well as lymphoma and leukemia. Cancers of the female pelvic organs, e.g., cervix and endometrium, as well as bladder cancer were among the cancer types with the lowest PPV and PAC. In general, the NPV and PANC were more than 97.1% for all cancer sites. Although there were differences in these estimates across the cancer sites by the type of first-degree relative (e.g., parents vs siblings), as reported by the proband, these differences were not statistically significant. The estimated PPV and PAC on cancer status reported by probands were higher for siblings compared with parents for breast, colorectal, and ovarian cancers, and higher for parents compared with siblings for prostate and lung cancers.
Estimates of PPV and PAC as well as NPV and PANC with corresponding 95% CIs for second-degree relatives are shown in Table 4. In general, the PPV and PAC of data reported by the proband for second-degree relatives were lower compared with those reported for first-degree relatives (Table 3), with notably large differences for cancer sites such as ovary, prostate, colorectum, and pancreas. The PPV and PAC for data reported by the proband on third-degree relatives were lower for all cancer sites, with the exception of cancers of the brain (71.4%), pancreas (71.4%), and female breast (69.8%), and for leukemia (72.7%) (data are not shown).

Predictors of Accuracy of Reported Family History of Cancer
Another objective of our analysis was to determine which of the proband's characteristics affect the disagreement between the family history data given by the proband and that of the reference standard. The results presented in Table 5 are restricted on the false-positive rate for all cancers combined and on the false-negative rate for all cancers combined, female breast cancer, prostate cancer, lung cancer, and colorectal cancer. After we accounted for familial correlations within a family, we found that male probands were more likely to overreport (i.e., report cases that were not true) all cancers combined compared with female probands (pϭ0.0021), and probands from clinic-based ascertainment were more likely to overreport all cancers combined compared with population-based probands (pϭ0.0136). In addition, probands ascertained by either mode were more likely to overreport cancer in first-degree relatives compared with second-degree rel-atives (pϭ0.0161). No statistically significant associations were observed between the false-positive rate and age of diagnosis, race or ethnicity, or family history of breast, ovarian, or colorectal cancer. Younger probands were more likely to report family history data with a lower false-negative rate (Table 5), particularly for female breast cancer (pϭ0.0008), prostate cancer (pϭ0.002), and colorectal cancer (pϭ0.027). In general, with the exception of breast cancer, clinic-based ascertained probands were more accurate compared with population-based ascertained probands. Furthermore, clinic-ascertained probands reported false-negative rates that were significantly lower than those reported by population-ascertained probands for all cancers combined (pϭ0.0217), lung cancer (pϭ0.038), and colorectal cancer (pϭ0.023). Breast cancer probands were significantly more accurate than colorectal cancer probands in reporting breast cancer in their first-degree relatives. Finally, reporting by nonwhite probands was consistently more accurate than Table 4. Estimates of PAC, PANC, PPV, and NPV in second-degree relatives using data from proband interviews and reference standard  reporting by white probands for all cancer sites, with the exception of lung cancer; however, these differences were not statistically significant. The relationship of the relative to the proband was found to be the most consistent predictor of accuracy among the false negatives. Second-and third-degree relatives were more likely to be reported inaccurately by the probands for all cancers combined, female breast cancer, lung cancer, prostate cancer, and colorectal cancer (pϽ0.001). Gender of the relative and age at diagnosis of the relative were not significant predictors, except in the case of prostate cancer, whereby the proband was more likely to be accurate for a relative diagnosed at an older age (pϭ0.010 for 60 -69 years and pϭ0.023 for 70ϩ years) compared with a relative diagnosed at a younger age.

Discussion
Although family history information is collected in clinical and research settings and is used to infer risk of the disease in population-based, case-control, cohort, or family-based studies, little information is available on the accuracy of a proband's reported family history. 14 -18 In the limited studies of validation of family history of cancer, most reports concentrate on the first-degree relatives of the proband, and most studies use case-control study designs. Inaccurate reporting of the disease status of relatives of the proband can result in biased estimates of familial aggregation and represent a major source of misclassification in genetic and epidemiologic studies. In fact, in case-control studies, nondifferential misclassification of the disease status of the relatives results in biased estimates of the odds ratio toward the null. 19 Differential misclassification can result in upward or downward estimates of the odds ratio. 20 We report here data indicating that cancer-affected probands report their family history of cancer with high PPV (Tables 3 and 4). In addition, the probability of the proband reporting a cancer in a relative, given that the relative had that cancer (PAC), is more than 95% for breast cancer in first-degree relatives; the probability of a true absence of cancer (PANC) in a relative given that the proband also reported a negative cancer history for that relative is more than 95% for first-and second-degree relatives. However, we noted major differences in the quality of data by cancer site and degree of relationship of the relative to the proband. The validity of data for most types of cancer among firstdegree relatives was similar in our study to that reported by Airewele et al. 18 but higher than those reported by Kerber and Slattery. 17 This finding might account for the way these studies were conducted compared with the present study. In both our study and that of Airewele et al., 18 multiple contacts were made with the families to test information reliability, whereas in Kerber and Slattery 17 that situation was not true. Similar to other reports, we found poor validity of data reported on individual cancer sites of the female pelvic organs, such as cervix or endometrium, as well as bladder cancer among first-degree relatives.
Reported family history of cancer by probands was significantly more accurate for first-degree relatives than for second-and third-degree relatives. The variability between the different cancer sites was larger for second-and third-degree relatives compared with firstdegree relatives. Our study clearly indicates that reporting of family history is more reliable for selected cancers. Some cancers, such as female breast cancer, prostate cancer, and leukemia, were reported accurately for first-, second-and third-degree relatives. In addition, the PPV was more than 70% for most of the cancers in first-degree relatives and remained more than 70% for many of the cancer sites, such as breast, prostate, colorectal, lung, and bladder, and for leukemia in second-degree relatives.
In our study, similar to that of Kerber and Slattery, 17 younger probands were better able to report familial female breast cancer, familial prostate cancer, and familial colorectal cancer compared with older probands. Male probands were more likely than female probands to overreport all cancers combined. In general, nonwhite probands reported more accurate family history of cancer than white probands. However, the difference in accuracy was not significant. Similarly, we found no statistically significant associations with respect to accuracy of the family history of probands and race or ethnicity of the probands. In addition to age at diagnosis of the proband, the best predictor of reporting accuracy was the ascertainment source of the proband.
Because genetic counseling is done in clinical settings based on reported family history of cancer, obtaining an accurate family history of cancer among relatives is important in predicting cancer risk. Even though the proportion of families who were clinic based in our study was 6.3%, accounting for 305 (9.5%) relatives, we observed statistically significant differences in the validity of the reported data with respect to false negatives and false positives (Table 5). Probands from clinic-based ascertainment sources compared with population-based sources were more likely to overreport (report more cases than were true) cancer in their relatives but were also likely to be more accurate in their reporting.
As noted in Subjects and Methods, the ascertainment of this clinic-based group is an established referral procedure that we generated over the past 10 years, whereby primary care physicians, hematology or oncology specialists, gynecologic oncologists, and surgical oncologists refer their patients who are at high risk of cancer because they may have multiple affected relatives. The differences observed between populationbased and clinic-based probands might be due, in part, to the fact that the clinic-based probands were more informed and more motivated about their risk. However, because of the small proportion of clinic-based families in our validation study, we recommend that this study be replicated with a larger series of clinicbased ascertained probands.

Limitations
A limitation of our study was that it did not include unaffected probands, who might be compared in their reporting with affected probands. However, the purpose of this study was to estimate the accuracy of family history reporting by affected probands, and the results are generalizable to this group. Another limitation of our study was that the number of clinic-based ascertained probands was small. However, even with the small sample size, the estimate of accuracy of reporting among the clinic-based probands was higher than the population-based probands. Finally, not all families were included in the study. The analyses were done for all families that had any verification as of July 2000, and families that did not have verification by that date were not included. It is possible that families with a positive family history for cancer tend to respond and complete their verification. We do not believe that this is a source of bias, because our analyses were based on individuals rather than families.
To our knowledge, this is the largest validation study of familial cancers and the first to use population-based cancer family registries. We found high reliability for most cancer sites among first-degree relatives and moderate reliability for second-and third-degree relatives. Overreporting of cancer was rare (2.4%), similar to the 2.9% reported by Airewele et al. 18 Race or ethnicity and gender of the proband did not influence the accuracy of reporting. However, the source of ascertainment of probands was important.