Quality of Life End Points in Cancer Clinical Trials: Review and Recommendations

In this presentation, issues that influenced the development of policies for inclusion of quality of life end points in certain Southwest Oncology Group clinical trials are reviewed. The key policies recommended by us and adopted by the Cancer Control Research Committee of the Southwest On-cology Group are as follows: (a) Begin assessment of quality of life in specific types of phase III protocols, (b) Always measure physical functioning, emotional functioning, symptoms (general and protocol specific), and global quality of life separately, (c) Include measures of social functioning and additional protocol-specific measures if resources permit, (d) Use patient-based questionnaires with psychometric properties that have been documented in published studies. In this review, we also recommend specific questionnaires. Our recommendations may prove useful for other cancer clinical trials groups and for multi-institution trials of treat-ment for chronic diseases. [J Natl Cancer Inst 81:485-495, 1989]

Barofsky (4) argued that many clinical trials have paid at least implicit attention to quality of life issues, e.g., the focus of the National Surgical Adjuvant Breast and Bowel Project on the survival time associated with different degrees of surgery.Other trials  with breast cancer patients address quality of life issues explicitly.The trial by Sugarbaker et al. (77) and the follow-up study reported by  provide classic examples of how quality of life measurement can inform physicians and improve medical practice.The reporting of unexpected treatment impacts on quality of life variables led to changes in procedures for radiotherapy and surgical and physical therapy for patients with soft tissue sarcoma.These changes were associated with improved patient functioning.
A number of reviews (1, [13][14][15][16][17][18][19][20][21][22] have indicated both the importance of measuring quality of life in medical research and the increase in published reports containing quality of life assessment.Hollandsworth (27) described this increase from the 1975-1979 period reviewed by Najman and Levine (20) to the period 1980-1985.The recent review by Aaronson (22) was not available when our work began.However, his recommendations are consistent with ours, and this underscores the agreement that is developing with respect to assessment of quality of life in clinical trials.Laupacis et al. (23) reviewed a number of statistics for summarizing benefits and harm associated with medical treatment for groups of patients.They described the properties of quantifiable measures such as the reciprocal of the absolute risk reduction.Another alternative to inclusion of a battery of patient-based instruments is the use of physician-based data on toxic effects to create a variable relating to time without symptoms and toxic effects (TWiST) that is amenable to survival analysis techniques (24).

Operational Definition of Quality of Life
In addressing quality of life issues in a clinical trials context, it is useful to have both a general definition and an operational definition that guides the measurement of the construct.Most health status and quality of life measures have included the three dimensions of health outlined in the World Health Organization (WHO) definition (25): "Health is not only the absence of infirmity and disease but also a state of physical, mental and social well-being." We advocate that, in clinical trials, measurement of quality of life be operationally defined with respect to health care and the treatment of disease, i.e., how physical, mental, and social well-being are affected by medical intervention.For example, the measurement of physical mobility and ability to perform a job would be appropriate in a cancer clinical trial.Satisfaction with one's job per se should not be measured, because it is affected by a number of factors, some of which operate independently of medical care.
The more general WHO definition can result in a decision to measure quality of life with a single, global instrument that captures the three components of the WHO definition or a similar definition, e.g., the Quality of Life (QL)-Index (26), the Functional Living Index-Cancer (FLIC) (27), or the QL Assessment (28).Use of a global measure allows comparison across a wide variety of trials; the single global measure can be supplemented with disease-specific and treatment-specific items for that trial.The problem is the lack of a single, global instrument that researchers accept as applicable to measurement of quality of life across many different cancer trials.
Other investigators have described quality of life with a larger number of dimensions.In reports by Aaronson (7) and Aaronson et al. (29,30), it was recommended that 12 components be included in the assessment of quality of life in clinical trials: pain and pain relief, fatigue and malaise, psychological distress, nausea and vomiting, physical functioning, symptoms and toxic effects, body image, sexual functioning, social functioning, memory and concentration, economic disruption, and global quality of life.The most recent version of the approach of the European Organization for Research and Treatment of Cancer (EORTC) (30) contains a core instrument of 36 items measuring functional status; psychological distress; fatigue and malaise, nausea and vomiting, pain, and other physical symptoms; social interaction; and global quality of life.Modules of items specific to various protocols will be associated with the EORTC core instrument.Psychometric properties of this new instrument were presented at the meeting of the EORTC Study Group on Quality of Life in late October 1988 (Aaronson NK: personal communication, 1988).Preliminary results were promising, but the scale for psychological distress must be refined.
Ware (31-33) suggested the measurement of physical and mental health, social and role functioning, and general health perceptions.More recently, Ware and his colleagues (34) developed a Short-form General Health Survey battery (20 items) for measuring these five constructs plus pain; the General Health Survey was developed for the Medical Outcomes Study.
Social relationships influence subjective evaluation of quality of life (15,20).Social functioning and social support, however, have been reported to be the most problematic areas for investigators to measure (26,29,31,32).Until this measurement problem is addressed and a credible instrument exists, clinical trials investigators may be advised to restrict quality of life components to physical and emotional functioning and symptoms.Investigators should be aware, however, that the construct for social functioning is a powerful one with respect to explaining variance in the measurement of quality of life.Those who use clinical trials protocols emphasizing quality of life issues and who can devote many resources to the measurement of quality of life may be in a position to experiment with existing brief instruments, to use some of the more complex instruments, or to develop new measures of social functioning for the clinical trials context.
A promising brief instrument for measurement of social functioning is the six-item Social Support Questionnaire (SSQ6) (35).This instrument has had two problems: validation was conducted with college students instead of patients, and there was a ceiling effect regarding social support.(A ceiling effect occurs when questionnaire scores are concentrated at the positive or high end of a set of response choices and the instrument fails to discriminate among patients with high scores.)The Duke University-University of North Car-olina (Duke-UNC) Functional Support Questionnaire (36) also looks promising in its early stages of development with patients of physicians in family practice; reliability and validity were documented for eight of the initial 14 items.The Medical Outcomes Study battery (34) has one item measuring social functioning, and the new EORTC core measure has two items in this area (30).
We recommend the use of a component-based definition of quality of life with emphasis on the separate measurement of physical functioning, emotional functioning, and symptoms.Adoption of this position has been influenced primarily by the work of Ware and his associates (31,32,37-39), Aaronson (7), and Aaronson and his co-workers (29).Inclusion of a global measure of quality of life is highly recommended if resources permit, because it is important that overall patient distress be measured (40).Najman and Levine (20) suggested that measures of quality of life that are focused on disease or treatment can miss critical aspects of quality of life for a particular patient.

Incorporation of Quality of Life Assessment in Clinical Trials Research
We developed nine policy recommendations to guide the inclusion of quality of life end points in Southwest Oncology Group trials.These strategies are also relevant to the research of other oncologists conducting cancer clinical trials.
(1) Select certain phase III protocols for assessment.
(2) Use patient reporting of quality of life.
(3) Select brief questionnaires, not interviews, to reduce patient and staff burden.Timing should be the same for all arms of the study.(9) Introduce special procedures to ensure compliance and quality control of the quality of life data.

Selection of Phase III Protocols
Resources may not permit the assessment of quality of life in every clinical trial initiated by a cooperative group.Certain classes of phase m studies have the greatest potential for meaningful quality of life assessment: Quality of life assessment would be of lower priority in some types of phase III trials; for example, (a) a trial comparing two treatments with similar levels of toxicity but different treatment expectations (e.g., two multidrug regimens for extensive small cell lung cancer) and (b) a trial in which patients in both arms receive a treatment with known negative impact on quality of life (e.g., orchiectomy for advanced prostate cancer) but those in one arm receive an additional drug known to have little or no toxicity (e.g., flutamide).
Two very practical factors have also guided the Southwest Oncology Group's first policy-based venture into systematic phase III multitrial and multi-institutional quality of life assessment.These are (a) choice of a disease for which quality of life impacts are broadly recognized and (b) substantial physician-investigator interest in including quality of life questionnaires in the protocol.These factors have resulted in selection of two prostate cancer trials, one comparing radical prostatectomy plus adjuvant radiotherapy to prostatectomy alone (pathologic stage C disease) and one comparing radical prostatectomy to radiation therapy (clinical stage A and B disease).
A treatment arena with potential for quality of life assessment is experimentation with granulocyte-macrophage colony-stimulating factors (CSF-GM).This therapy mitigates toxic effects experienced by patients receiving cytotoxic agents and permits a higher dose.However, CSF-GM therapy is associated ,with its own toxic effects.Evaluation of the trade-off between a greater potential for cure with a higher dose of the primary drug versus the impact of the toxic effects of CSF-GM on patient quality of life requires feedback from patients regarding effects of all aspects of treatment.
In general, phase II trials are not considered appropriate for assessment of quality of life, primarily because such trials are not comparative; biological effectiveness of a particular drug is at issue.However, Jones et al. (41, p. 256) alluded to the relevance of quality of life assessment for phase HI trials.
A most important aspect of a phase HI study is the quality of the patient's survival.It seems nonsensical to apply a therapy which detracts from the quality of survival while causing objective tumor response.The patient only appreciates the toxicity of the therapy unless he is deriving a significant improvement in function as a result of the treatment.In this respect the evaluation of the quality of survival and subjective improvements is important during these studies, but as yet they [these fac-tors] cannot be used as objective response criteria per se.
In the same volume, Aaronson and co-workers (42) noted that, with respect to prostate cancer trials, a consensus for the measurement and reporting of more objective quality of life criteria is most apparent in the measurement of the four key components of quality of life we recommend for measurement (physical, emotional, and social functioning and symptoms).There is less agreement about the specific measures investigators should use in assessing these components.

Patient Reporting of Quality of Life
Patient-based measures of quality of life should supplement physician judgments of treatment-related toxic effects routinely reported in most trials.
Aaronson and co-workers (42) distinguished between subjective response criteria as judged by observers and quality of life assessments provided by patients.Subjective response criteria are provided by external raters, usually the physician or nurse, for evaluation of performance status and toxic effects.Such criteria are considered subjective in relation to the accepted objective criteria applied to assessment of biological activity in phase II trials and treatment efficacy in phase III trials (41).Quality of life assessments are based on data given by patients in response to questions (questionnaires and interviews) regarding the types of health dimensions described by Ware (31-33) and Aaronson and associates (29,30,42).
Our review is concerned with the addition of patient-based measures to the subjective response criteria currently used in clinical trials.There are several reasons why physician-based performance status measures should not be the sole assessment of patient quality of life: the psychometric properties of the performance status measures, the failure of these measures to correlate with psychosocial measures, and the low correlations between physician and patient assessments.Our emphasis on patient-based measures is related to the importance of the patient's assessment of the quality of life (20).
Subjective response criteria have been used for several purposes in randomized trials: to assess patient quality of life, to determine eligibility for trial participation, to stratify patients within treatment groups, and to determine treatment efficacy (43).The first measure of patient performance status was the Karnofsky performance status (KPS) scale (44).Although the validity of the KPS scale has been documented, there has been controversy regarding its reliability in terms of inter-rater agreement; variables such as use of standardized training and testing procedures (including operationalized definitions for performance levels) and use in the environmental context (e.g., home vs. clinic ratings) are known to affect reliability (14,17,18,42,(44)(45)(46)(47)(48)(49)(50).
A shorter version of the KPS scale has been used by the Eastern Cooperative Oncology Group (ECOG) (18,42,51) and the Cancer and Leukemia Group B (CALGB); the Southwest Oncology Group uses a similar four-level scale with slightly different descriptors for some of the levels.The WHO has endorsed this briefer scale (3,18,52), which is known as the ECOG, Zubrod, or WHO performance status scale, but some reviewers (18,42) have noted its lack of psychometric documentation.In addition, Cella and Cherin (78, p. 27) have noted that the KPS and the ECOG scales do not "correlate highly with subjective/psychosocial measures of quality of life or extent of distress." Toxicity grading scales are used by physicians to monitor the toxic effects of treatment.Currently, the National Cancer Institute is promoting an effort to standardize these scales for the major cooperative groups.The revised criteria are being phased in by the Southwest Oncology Group, ECOG, CALGB, the North Central Cancer Treatment Group, and other cooperative groups.
Spitzer et al. (26) developed the QL-Index, a brief Apgar-type (neonatology measure) instrument that covers five dimensions of quality of life and can be used by a physician, nurse, or other health professional and by the patient; there are also single-item Uniscale versions for health care providers and patients (26).Reliability and validity were demonstrated for the QL-Index (26,53,54).However, response variation on the item regarding social functioning was minimal; most patients reported little problem with social functioning (54).Very low composite scores (0-3) were rarely obtained (26).This scale can be more accurately termed a quality of life measure, because it assesses more than physical functioning, although the total score correlates more substantially with measures of physical functioning as opposed to psychosocial functioning (26,54).The QL-Index also permits comparison of patient and physician ratings of patient quality of life.
Correlations between patient and physician answers to questions in the same instrument or similar instruments are not always high (9,(55)(56)(57)(58).Nelson and co-workers (56) questioned whether patients and physicians are rating the same thing and to what extent both filter such ratings through their respective orientations and experiences.Najman and Levine (20) noted the importance of considering discrepancies between expectations and performance.The extent to which patients have readjusted expectations after illness and treatment could explain patient-physician differences in ratings if physicians' ratings are based on "the extent to which they [physicians] think the patient ought to be impaired in light of ... age-disease status rather than on the basis of the degree of impairment experienced by the patient" (56, p. 3337).

Selection of Brief Questionnaires
Selection of a modest number of brief questionnaires reduces the burden imposed on patients and staff and increases the chances of more uniform compliance over time.
Many single-institution studies have incorporated brief measures of quality of life and have done so on a longitudinal basis.Compliance on the part of both patients and medical staff has, in general, been good.However, Aaronson et al.
(59) reported that the most common problem encountered by quality of life researchers in the EORTC was obtaining the cooperation of medical staff at the various sites.Adding multiple institutions to a study complicates the data-monitoring process and increases the potential for problems with noncompliance and missing data.
Clinical trials investigators must always consider the physical status of patients enrolled in the trial and their ability to complete a set of questionnaires.If a quality of life questionnaire is too long and demanding, a very ill patient will not be able to complete it, and the result is a biased sample of patients.It is important to gather data during times in the treatment process that are difficult for the patient; the less demanding the questionnaire requirements, the more likely patients are to provide data during such periods.Ganz and associates (60) reported declining rates of self-administration of a quality of life instrument over 4-week follow-up periods.
Use of an interview approach to assess quality of life in randomized trials is not practical on a wide scale because of the required time and monetary investments in both training and administration.However, interviewer-administered instruments have been used in cancer trials; for example, in the study by Sugarbaker et al. (11), both the Sickness Impact Profile (SIP) (67) and the Activities of Daily Living (62) instruments were administered.Studies of the psychometric properties of the SIP were conducted primarily with the interviewer-administered version of the instrument, but more recently, the SIP has also been self-administered.In general, an instrument that can be completed by the patient is more practical; a self-administered instrument should require minimal instructions from the clinic staff.Interviewer-administered instruments are appropriate for in-depth examination of quality of life issues and/or trials that have quality of life as a primary objective.

Selection of Instruments With Established Psychometric Properties
A quality of life instrument must demonstrate adequate reliability, validity, and, if data are available, responsiveness to change over time if it is to be included in a clinical trial protocol.
Reliability of measures.The score on a quality of life instrument is composed of random error and systematic error (bias) as well as a measure of the person's quality of life.There are two major approaches for evaluating the consistency with which an instrument measures a particular concept: test-retest and internal consistency techniques (63-66).Most researchers prefer that an instrument demonstrate reliability coefficients >.80.Factors affecting reliability and criteria for determining adequacy in different measurement contexts have been documented previously (39,64-66).There is controversy in the literature regarding the relative importance of the reliability criterion in the determination of an instrument's responsiveness to change over time (67).
Validity of instrument.Determining whether a quality of life instrument measures what it was intended to measure is the focus of validity analyses.There are three main.types of validity: content (i.e., common sense or face validity of the items); criterion (comparison to a "gold standard" measure); and construct (behavior of test scores in various contexts) (64-66).Factors affecting prediction of test or questionnaire scores include group differences, correlations among dimensions measured by the test and/or with other variables, and changes associated with therapy over time.
Responsiveness to change.Although responsiveness to change is related to construct validity, it has particular relevance to the structure and purpose of clinical trials research.Is there evidence that the instrument can detect change in quality of life after therapeutic intervention and over the course of treatment?Evidence demonstrating sensitivity to change is not uniformly available.Many of the instruments used with cancer patients have demonstrated change in some treatment contexts and not in others; other instruments have detected change over time but not between treatments (60,68-72).Guyatt et al. (73) described approaches to determine responsiveness of a quality of life instrument.We recommend selection of established quality of life measures for which psychometric properties have been documented.Development of protocol-specific instruments, which was the modular approach first suggested in reports by Aaronson (7) and Aaronson and his colleagues (29), may not be very realistic for large-scale clinical trials research.In fact, the Aaronson group, as noted, has elected to develop, for use in all trials, one core instrument that can be supplemented by protocol-specific items (30) (Aaronson NK: personal communication, 1988).Trials having quality of life as a primary objective and/or trials'in which quality of life is addressed through an in-depth substudy may provide situations appropriate for developing new instruments.Investigators must recognize, however, the large investment of budget and time required to select items, field-test instruments, and demonstrate necessary psychometric properties as described here.A poorly developed instrument will not advance research in this area.On the other hand, it may be necessary for clinical trials investigators to contribute to the scientific development of quality of life measurement if existing measures are not relevant to the needs of a trial.

Use of Categorical Versus Visual or Linear Analogue Scale
The categorical scale is more feasible than the visual or linear analogue scale (VAS/LAS) for most large-scale clinical trials research.
The categorical response format has a limited number of labeled choices defining outcomes in terms of factors such as frequency, amount, and intensity.The VAS/LAS is usually 10 cm long, with descriptive anchors provided at both ends of the line.The respondent places a mark at any point on the line, indicating the degree of each factor.In theory, the VAS/LAS, by providing a greater range of response choices, is more reliable, valid, and responsive to change over time than categorical scales, but comparisons of the two types of scales have not shown this to be the case (3, 19).The VAS/LAS approach is sometimes difficult for patients to understand (3,60), and processing is more labor intensive.In medical research, the VAS/LAS is used frequently as a measure of quality of life.Commonly used instruments are the Uniscale (26), the linear analogue self-assessment (LASA) scale (74,75), the Quality of Life Tool (76), and the LASA/QL Assessment (25).
The FLIC is a 22-item, graded linear analogue scale frequently used to measure the quality of life of cancer patients (2,27).Seven categories are provided on a rating line, but the patient still marks one point on the line.If the respon-dent marks a point other than one of the pre-established seven points, the response is determined as follows: the midpoint between two established points containing the respondent's mark is found, and the score is the nearest whole integer.In view of the lack of psychometric superiority of the VAS/LAS over categorical measures and the need for time-consuming measurements, this approach is not recommended for large-scale clinical trials research.The FLIC is only slightly less labor intensive, and for this reason, it is not recommended either.Ganz et al. (60) reported problems in administering the FLIC with respect to patient compliance and comprehension of both procedure and items; for example, patients usually circled a number rather than marking the line.
When categorical scales are used, an appropriate number of response levels must be designated.Fayers and Jones (79) conclude that four or five categories are sufficient to achieve acceptable reliability.More response categories may be useful in enhancing responsiveness of the measure to change over time; Guyatt and associates (73) suggest seven to 10 levels.

Use of Separate Measures of Quality of Life Components
Certain components of quality of life should be measured as end points in cooperative group trials; the three essential components are physical functioning, emotional functioning, and symptoms.We suggest the use of specific instruments to measure these components.These instruments are recommended because they (a) are based on patient reports, (b) are brief' (c) use a categorical response format, and (d) have acceptable psychometric properties.Instruments in each category are divided into those adopted by the Southwest Oncology Group for selected upcoming trials and alternative instruments complying with recommendations 2-7.Psychometric properties are presented for the instruments selected by the Southwest Oncology Group; psychometric properties for the other instruments can be found in the references or in the group's position paper, which is available on request.As new instruments are developed (e.g., the new EORTC core instrument), the appropriateness of selected quality of life measures can be re-evaluated for use in future trials.

• Short-form Health Survey: Medical Outcomes Study
(34).This instrument represents the newest published generation of the Rand health status measures.It has 20 items covering six constructs: physical functioning (six items); role functioning (two); social functioning (one); mental health (five); health perceptions (five); and pain (one).Scores can range from zero to 100, with higher scores indicating better health; a total score across scales is not obtained.Stewart and co-workers (34) have presented a preliminary report of the psychometric properties of the Short-form Health Survey.Values for Cronbach's alpha (63) for the four multi-item scales were .81-.88; reliability coefficients for the full-length versions of these scales were .89-.96.All possible scores were obtained for all measures, which ensures adequate variability in response.Internal consistency reliability coefficients were not calculated for the one-item measures of pain and social functioning.The scales demonstrated convergent validity (item score correlated substantially with total score from item's subscale) and discriminant validity (item correlated better with its subscale than with other scales).Therefore, the scales met the multitrait criterion of the multitrait-multimethod approach specified by Campbell and Fiske (77).The scales of this instrument also discriminated between the patient sample and the general population sample with respect to the fact that a greater percentage of the patient sample scored in the poor health range.Correlations between sociodemographic variables (age, sex, education, income, and race) and the health scales were consistent with the hypotheses and with findings from instruments using longer forms.Additional data supporting the psychometric properties of these scales are being prepared for publication.(31,32,34,37,38).The Rand Personal Functioning Index has 21 items and is scored from zero to 100; a score of 100 indicates no physical limitations.The Role Limitations Scale has three items and is scored zero for no role limitations and 1 for one or more role limitations.

• EORTC scales for functional status and psychological
distress (29,30).The EORTC personal functioning scale has six items, and the role functioning scale has two items.Both scales are scored yes or no, and lower scores represent more dysfunction.The psychological distress scale has five items, which are scored on a five-point scale; higher scores reflect more distress.A new core instrument of 36 items is being developed (30).

• Brief Profile of Mood States (POMS) scale (78). The
Brief POMS scale has 11 words from the original POMS instrument (e.g., blue, discouraged, and unhappy).Respondents rate each word on a four-point scale; higher scores reflect more distress.

• Psychosocial Adjustment to Illness Scale-Self-Report (PAIS-SR) (79).
The PAIS-SR has 46 items and seven subscales: health care orientation, vocational environment, domestic environment, sexual relationships, extended-family relationships, social environment, and psychological distress.The psychological distress subscale has seven items.Items are rated on a four-point scale; higher scores reflect more distress.There is also a total score that can function as a global measure.

• Nottingham Health Profile, Part I (80,81). Part I of the
Nottingham Health Profile has 38 items covering six types of experience with illness: pain, physical mobility, sleep, emotional reactions, energy, and social isolation.Each item is answered yes or no, and weights are applied to items within each dimension so that the score for the dimension can be 100; the higher the scores are, the greater is the dysfunction.
• Me Master Health Index Questionnaire (82)(83)(84).This questionnaire is similar to the Nottingham Health Profile, but it has three scales (physical, social, and emotional) and 59 items.Although the reliability of the McMaster Health Index Questionnaire is not as high as that of the Nottingham Health Profile, its validity is well established.

Measures for General and Treatment-Specific Symptoms
In most trials, treatment-specific symptom items must be developed by the investigators.The following instruments measure more general symptoms.

Southwest Oncology Group Instrument
• Symptom Distress Scale (85-90).The Symptom Distress Scale includes 13 items rated on a five-point scale; higher scores reflect more distress.This scale measures general symptoms and was designed to be used with all cancer sites.The scale covers 11 areas such as nausea, loss of appetite, insomnia, and pain.Reliability coefficients (coefficient alpha) of J8-.89 were reported (85-89).Construct validity was demonstrated for the scale in that symptom distress was negatively correlated with a global measure of quality of life (90).McCorkle et al. (87,88) found the instrument to be sensitive to changes over time and between treatment groups.In the group receiving office care only, patients with lung cancer reported an increase in symptoms 6 weeks before a similar increase occurred for patients in two groups receiving home-nursing treatment.

Alternative Instruments
• (72).The BCQ is a 30-item quality of life measure specific to the experience of women undergoing adjuvant chemotherapy for breast cancer.Seven dimensions have been identified including psychological distress and social interaction.Questions are rated on a seven-point scale; a total score is determined, and higher scores reflect better quality of life.

Global Measures of Quality of Life or Assessment of Patient Health Perceptions
Global measures have many uses in a trial.A physicianrated global measure can substitute for patient measures when patients are too ill to complete forms.Global mea-sures may tap aspects of a patient's treatment experience not assessed by the more specific measures.

Southwest Oncology Group Instrument
• LASA Uniscale (28,91).The single-item Uniscale is worded as follows: "Please score how you feel your life has been, affected by the state of your health (any disease or treatment) during the last week."This Uniscale, which will be included in the Southwest Oncology Group quality of life assessment packet, was taken from the 31-item LASA developed by Selby and colleagues (28).The test-retest (7 days) reliability coefficient for the Uniscale was .72; the Uniscale correlated significantly (P < .001)with a physician global rating and the SIP (r > .70)and with the KPS (r > .6)(28).
The correlation between the LASA Uniscale and KPS physician ratings was .9(91).For use in Southwest Oncology Group trials, the Uniscale has been changed from a linear analogue scale to a categorical scale; it has five response categories ranging from extremely unpleasant to normal (no change).

Alternative Instruments • Spitzer QL-lndex (patient-administered version) (26).
The QL-lndex includes the following five items rated on a three-point scale: activity, daily living, health, support of family and friends, and outlook; higher scores reflect better functioning.The QL-lndex can be completed by the patient or the physician.• EORTC Well-being/Satisfaction Scale (29).This scale has four items rated on a four-point scale; higher scores represent greater well-being.

Inclusion of Measures of Social Functioning and Other Protocol-Specific Variables if Resources Permit
The recommendation of brief measures of social functioning that achieve sufficient variability in response levels is problematic.Two promising instruments are the six-item SSQ6 (35) and the Duke-UNC Functional Support Questionnaire (36).Examples of other protocol-specific quality of life instruments are the Trail-Making Test B (92), which measures expected treatment impacts on cognitive functioning, and the PAIS-SR sexual relations subscale (79), which measures expected treatment impacts on sexual functioning.

Administration of Quality of Life Instruments
Fayers and Jones (19) suggest that the ideal data collection schedule should involve quality of life measurement before, during, and at the end of treatment.They allude to problems in achievement of this level of precision, but they do report that patients were willing to complete a five-question diary card daily in several Medical Research Council studies.Daily administration is impractical in most trial situations.Aaronson (1) reported that, in one EORTC trial, there was early noncompliance with a schedule of data collection on days 1, 10, and 22 of each chemotherapy cycle.
Because the impact of the treatment on quality of life is at issue, it is useful to measure quality of life while the patient is being treated.Some trials involve short-term, intensive therapy and a long-term, less intensive regimen.The decision of when quality of life should be measured must be dictated by the nature of disease progression at a particular site (2), the timing of known toxic effects of the treatment, and other features of the protocol's design.In one study, comparison of two treatments across three treatment periods (mean overall scores across six courses of therapy) resulted in no clear difference in the quality of life of patients receiving the two treatments, even though one treatment produced more severe toxic effects.Differences emerged more clearly when treatment cycles were compared to rest periods and when successive cycles were compared; i.e., the quality of life of patients receiving one of the regimens appeared to be diminished to a greater degree (69).
Two prostate cancer trials of the Southwest Oncology Group provide examples showing that the timing of measurement must be tailored to quality of life issues.One trial involves a. comparison of adjuvant radiotherapy versus no adjuvant therapy after radical prostatectomy.The other trial randomizes patients to receive radical prostatectomy or radiotherapy.The quality of life of patients in both trials will be assessed before treatment begins, after 6 months, and yearly for 5 years.In addition, patients in the first.trialwill be assessed at 6 weeks after initiation of treatment so that the impact of the toxic effects of radiotherapy can be evaluated; the impact is expected to be most severe at the completion of the 6-week cycle of therapy.In the second trial, the assessment of protocol-specific symptoms at the completion of surgery is not feasible; for example, assessment of incontinence would not be feasible because use of a catheter is required for several weeks.It is, therefore, not possible to compare the severity of such symptoms at the end of surgery and at the end of radiotherapy.It is critical that, regardless of the measurement times chosen, patients in both arms of the study be assessed at the same times.
A related question deals with the time frame imposed on each item.If quality of life will be measured only a few times, it is important that patients understand whether they are responding with respect to toxic effects during cycles of therapy, rest periods, or both.Use of a short time frame such as the past week can be criticized as resulting in a snapshot or biased sample relative to a longer time frame such as a month.Perceptions about quality of life during a more difficult period may be missed if the time frame is too short.A short time frame should be chosen when the objective is measurement of the impact of recent treatment.
In the prostate cancer trial described, the assessment at 6 weeks is designed to measure the impact of radiotherapy at its conclusion, when toxic effects are expected to be the worst.Therefore, the patient should be asked to respond with respect to the past week, not the past month.It was reported (93) that the relationship between response patterns dependent on situations (state) and enduring response patterns (trait) became stronger as the time frame for items increased from 1 day to 1 month.In the prostate cancer trial, we want to detect change due to recent treatment effects and do not want to detect a traitlike response such as the tendency to complain.However, there is good reason for us to consider use of a 1-month time frame for assessment of periods in which there will be less change in treatment impact.This approach could still detect group differences over time.A longer time frame is appropriate for the prestudy assessment and at the yearly measurements after treatment.The adjustment of the time frame to the expected stability of the response in clinical trials represents new territory for the developers of quality of life measures (Donaldson G, Ware JE Jr.: personal communication, November 1988).The key is development of a balance in selection of the time frame so that investigators can detect differences both between treatments and over time and minimize short-term fluctuations that do not represent real change but do create noise in the data.Practical considerations also apply.It is difficult enough for data managers to maintain patient follow-up for quality of life assessment.Requiring administration of a different instrument battery for only one of a series of assessments is problematic.In the Southwest Oncology Group prostate cancer trial, a time frame of 1 week will be used for all assessments.

Introduction of Procedures To Ensure Compliance and Quality Control
Special procedures should be introduced to ensure compliance and quality control of the quality of life data.Reports by Aaronson (]) and Aaronson et al. (30,94) suggested that a key person at each trial site should be responsible for quality of life data collection.A protocol quality of life study coordinator will be designated in Southwest Oncology Group trials* preferably a nurse investigator at the same institution as the therapeutic protocol study coordinator.Identification of a study coordinator for all institutions participating in the trial will also help to minimize problems with compliance and missing data.There is a greater need for "advance work" when quality of life assessment is initiated as a trial end point.Extra effort must also be expended during follow-up periods; clinic staff should be encouraged to communicate to patients the importance of the longitudinal data collection and to institute procedures enhancing follow-up data collection.Finally, the use of multiple-item scales versus one-item scales allows replacement of missing values with estimates based on other scale items (34).
Data collection pilot tests can also be conducted, with special emphasis on the interface of the data collection schedule, normal institution procedures, and the availability of support staff.Increased use of flow sheets, study calendars, and patient tracking cards and identification or flagging of patient medical records can help in promotion of compliance in both patient completion of the questionnaires and clinic staff cooperation in the data collection effort.Because quality of life measures are often new to clinical trials investigators, additional training of data collectors is necessary, particularly by provision of detailed, standardized instructions for data collection procedures.This could require more site visits to evaluate an institution's procedures for collecting quality of life information.Other procedures include (a) involvement of managers responsible for quality of life data collection in regular meetings where they can learn more about the importance of measuring quality of life and (b) distribution to site staff of a newsletter emphasizing the different nature of the quality of life instruments but also including topics not related to research (DePauw S: personal communication, 1988).

Discussion
Our suggestions for incorporation of quality of life measures are untested in Southwest Oncology Group trials, but they represent a feasible approach in view of existing procedures, relationships with member institutions, and available resources.In this approach, quality of life assessment will be addressed through companion protocols, because this is consistent with the policy for other cancer control and ancillary studies.The decision to add quality of life measures to therapeutic protocols must be based on more experience with the assessment packages across several protocols.Use of at least one common quality of life instrument in all protocols selected for quality of life measurement would allow comparison of treatments across protocols; at the very least, this should be done within a disease group (e.g., protocols for genitourinary cancer).A cooperative group's quality of life assessment policies need to be re-evaluated annually, because of the changing state of the art in quality of life assessment.As experience with measurement of quality of life variables increases, procedures for data collection can be improved.In addition, new instruments will be developed that are more appropriate for multi-institution clinical trials research.These instruments can be incorporated into the battery and used across all selected protocols.Selection of instruments becomes an exercise in trade-offs.Long questionnaires are a burden to both patients and staff.However, since psychometric properties can be compromised in brief questionnaires, investigators must carefully review this information.We have tried to include instruments that reduce the burden but do not compromise psychometric properties.Timing of administration also represents a trade-off.Repeated administration allows more opportunities for observation of the effect of the treatment on the patient; however, with increasing impairment of patient function over time, missing data can become a serious problem.Each cooperative group will have to develop procedures for ensuring accurate and complete data collection, i.e., compliance on the part of both patients and data collection staff.The designation of one individual at each institution to be responsible for quality of life data collection and the designation of a quality of life assessment coordinator for the protocol should alleviate many of the problems associated with the introduction of a new type of data collection.Such procedures must be integrated with the "ways" of the cooperating institutions.However, our experience indicates that several of our suggestions have general applicability.
Communication among researchers associated with different cooperative groups can also help in selection of the most appropriate quality of life instruments for different types of protocols.A long-term goal of a cooperative group's policy on quality of life assessment could be to extend both informal and formal assessment of quality of life to more types of clinical trials protocols.
The inclusion of formal quality of life end points in clin-ical trials research has a number of benefits that outweigh the resulting increase in data collection complexity.Contrary to expectations in one study (70), patients with metastatic breast cancer who received intermittent chemotherapy did not demonstrate better quality of life than those who received a continuous regimen, and the continuous procedure produced better results with respect to traditional end points-response to treatment and time to disease progression.The inclusion of quality of life measures allows physicians to be more confident that a particular therapy is better not only with respect to cure but also with respect to patient tolerance.
The study by Sugarbaker et al. (//) indicated the usefulness of quality of life data in adjustment of treatment approaches and individualization of patient regimens.The degree to which such adjustments can be made during a clinical trial is problematic, but adjustments are currently made to some extent in the use of the toxicity grading scales.The change of treatment protocols after completion of a trial, which occurred with the study by Sugarbaker and co-workers, is certainly feasible.
Finally, quality of life measurements can inform physicians about the need for supportive care for patients, e.g., handling of the stoma after surgery or problems with body image (75).In one study, psychological morbidity in breast cancer patients undergoing mastectomy was detected by a standardized interview; in the absence of such interviews, physicians generally did not report psychological disability (9).Reporting of the interview data resulted in the addition of a nurse who specialized in providing support (informational and emotional) for patients and in monitoring postsurgery adjustment.Psychiatric morbidity was significantly reduced after 12 months for women receiving this counseling (9).
Even when adequate quality of life instruments do not exist and/or resources do not permit formal patient-based assessments, quality of life issues can still be addressed.For example, a quality of life impact statement (Barofsky I: personal communication, 1988) could be required in the protocol proposal.The investigator would be asked to address expected impacts of the proposed treatment on the patient's physical and emotional functioning and the types and degrees of symptoms the patient could experience.The emphasis would be on physician use of routine data on toxic effects in decisions to alter treatment.The main objective of the impact statement approach would be to improve a protocol in relation to patient quality of life before the protocol is implemented.However, the procedure could also alert researchers to the need for ongoing patient reporting of treatment-related experiences affecting quality of life.

Conclusions
One reason for the increased interest in quality of life assessment in clinical trials is that investigators want to make improvements in such areas as patient support services during treatment.The second reason is that investigators see value in supplementing tumor response and survival data with information on patient perception of treatment impact.
As quality of life measures have been used increasingly in clinical trials, more data have been published documenting both negative and positive impacts associated with various treatment approaches.These data suggest the benefits associated with pursuit of a more systematic approach to measurement of quality of life in clinical trials research.We recommend the use of quality of life end points in selected types of cancer treatment trials and assessment of the following components of quality of life: physical functioning, emotional functioning, symptoms associated with the disease and treatment, and the patient's perception of global quality of life.We have suggested instruments for measuring each component, with the provision that the instruments selected should be constantly reviewed in regard to performance in clinical trials and development of more appropriate measures.

( 4 ) 6 )
Select quality of life instruments with established psychometric properties.(5) Use categorical versus visual analogue scales.(Always use separate measures of physical functioning, emotional functioning, symptoms (general and protocol specific), and global quality of life.(7) Include measures of social functioning and other protocol-specific variables if resources permit.(8) Administer quality of life instruments a minimum of three times, with timing dependent on disease and treatment parameters, typically (a) at baseline, (b) during therapy at a time when maximal assessment of side effects is possible, and (c) at the end of treatment.

scales for symptoms of lung cancer and toxic effects of treatment (
29,30).The scale for symptoms has 10 items rated on a four-point scale; higher scores reflect more difficulty with symptoms.The scale for toxic effects has six items.The new EORTC core instrument has 14 items for symptoms; it covers pain, nausea and vomiting, fatigue and malaise, and general symptoms.•