The Role of Response-Shift in Studies Assessing Quality of Life Outcomes Among Cancer Patients: A Systematic Review

Objective: Response-shift has been cited as an important measurement consideration when assessing patient reported quality of life (QoL) outcomes over time among patients with severe chronic conditions. Here we report the results of a systematic review of response shift in studies assessing QoL among cancer patients. Methods: A systematic review using MEDLINE, EMBASE, and PsychINFO along with a manual search of the cited references of the articles selected, was conducted. A quality review was performed using STROBE criteria and reported according to PRISMA guidelines. Results: A systematic review of 1,487 records published between 1,887 and December 2018 revealed 104 potentially eligible studies, and 35 studies met inclusion criteria for content and quality. The most common cancer patient populations investigated in these studies were breast (18 studies), lung (14 studies), prostate (eight studies), and colorectal (eight studies). Response shift was identified among 34 of the 35 studies reviewed. Effect sizes were reported in 17 studies assessing QoL outcomes among cancer patients; 12 of which had negligible to small effect sizes, four reported medium effect sizes which were related to physical, global QoL, pain, and social (role) functioning and one reported a large effect size (fatigue). The most prevalent method for assessing response shift was the then-test, which is prone to recall bias, followed by the pre-test and post-test method. Given the heterogeneity among the characteristics of the samples and designs reviewed, as well as the overall small to negligible effect sizes for the effects reported, conclusions stating that changes due to internal cognitive shifts in perceived QoL should account for changes observed in cancer patients' QoL outcomes should be interpreted with caution. Conclusion: Further work is needed in this area of research. Future studies should control for patient characteristics, time elapsed between diagnosis and baseline assessment and evaluate their contribution to the presence of response shift. Time between assessments should include short and longer periods between assessments and evaluate whether the presence of response shift holds over time. Possible avenues for inquiry for future investigation are discussed.


INTRODUCTION
Measurement change in patient reported quality of life (QoL) outcomes is an urgent necessity of clinical practice. Responseshift refers to measurement of patient reported outcomes that reflect better outcomes over time not because the patient is doing better but because the patient has now adapted, psychologically, to match their new life circumstances (e.g., urinary incontinence), in order to better cope with them (1,2). This particular "shift" in an individual's response is considered to involve a re-prioritization of values (e.g., physical function is valued less than cognitive function whereas prior to diagnosis their priority may have been reversed), a recalibration (e.g., "I will survive this, even if the quality of my life will significantly change"), or reconceptualization (e.g., significantly changing standards for interpreting meaning; what constitutes "good" now becomes different than a recent previously held belief) (2). In 1999, Albrecht and Devlieger used the term "disability paradox" to describe the notion that people with disabilities report to experience a much better QoL than expected and this concept has become a key component of response shift (3). Some cancer patients experience large amounts of pain or side-effects due to their condition or treatment such as surgery, chemotherapy or radiation therapy (2,4). The distress associated with the diagnosis often forces patients to engage in cognitive reframing of their circumstances to ease the psychological pain they are experiencing (4,5). This process includes a reprioritization of previously held values, internal standards, and expectations in order to help the individual cope with high levels of pain (2,6). Taking these changes into account when assessing QoL among cancer patients during the diagnosis-tosurvival continuum, however, is both important and challenging. Measurement of patient reported outcomes assumes relatively good within-individual stability and consistency in ratings (6). This assumption translates to feedback for health professionals with respect to how treatments and interventions affect patients. If large error variations exist between patients' responses due, not to external circumstances but rather, to changes in internal standards and reconceptualization, then these patient reported outcomes lose the predictive value they are attributed. A meta-analysis reported in 2006 showed statistically significant response-shift among most of the studies identified (7). However, effect sizes associated with response-shift effects were small, whereby the largest ones were reported for fatigue and global health related quality of life (QoL) (7). Patient reported outcomes are particularly important in cancer research aimed at identifying treatment side effects. These outcomes help to inform patients and clinicians in the treatment decision-making process at the start of the cancer journey, as well as in the development of establishing standards of patient care and interventions aimed at improving patients' QoL. Thus, the cancer population is a particularly clinically-relevant subgroup to examine with regard to the presence or absence of response-shift.
Response shift has been commonly measured in three ways. Using the pre-test/post-test method, patients complete a baseline assessment (pre-test) and then they complete an identical assessment after a period of time (post-test) (7)(8)(9). In oncology research, the post-test is usually administered after the cancer treatment (2). The pre-test/post-test design is easy to administer to patients but requires large samples for analysis and is difficult to interpret. Changes from pre-to post-test could be representative of a response-shift, QoL changes due to treatment, or both. The then-test method is the second most commonly used method for assessing response-shift and consists of adding one extra step to the pre/post-test, administered at the same time as the post-test. During this additional (then) test, the patient is asked to rate their QoL outcomes retrospectively, thinking of the pre-test time, but using their current value judgments and perceptions (9). Response shift is calculated as the difference between the then-and pre-tests, while true changes in QoL are calculated as the difference between the post-and then tests (1,9). The then-test is easy to analyse and interpret, however it is susceptible to recall bias and is more burdensome due to the addition of one extra (then) test (9). Finally, in the anchor/ideal scale design, patients are asked to state their ideal response to a question or to provide an upper and lower limit (i.e., anchors) of a specific domain at both the pre-test and post-test (9). Changes between the pre-test and post-test of either the ideal or anchors indicate a recalibration response shift (1,7,9). This design type can be easily analyzed and interpreted, but it is susceptible to ceiling effects and does not properly measure reconceptualization and reprioritization (3,9,10).
One of the major goals of assessing quality of life changes over time is to discern to what extent changes reported over time represent changes that have to do with the clinical intervention/treatment and to what extent they reflect confounds and measurement error (factors that are not accounted for but that exert influence on the outcomes, including response shift). It is usually assumed that patients' internal states are more or less stable over time (regression to mean), thus patient reported outcomes are meaningful predictors of patient outcomes (2). If for any number of reasons, the person's perception of the construct under evaluation changes over time, then comparison of the two or more longitudinal assessments during the cancer journey (e.g., diagnosis, during treatment, post-treatment) may be distorted and lead to the development of unnecessary interventions. If changes in internal states affect patient reported outcomes by means of response shift, then these changes should be accounted for in evaluations of patient reported outcomes to fine-tune the measurement process and arrive at accurate assessments that lead to reliable patient interventions (1,2,9,(11)(12)(13). If response shift is a significant predictor of QoL outcomes, its effect size will have important implications for assessing the effect of cancer treatments on patient reported QoL as results may reflect a response shift, a treatment effect, or a complex combination of both (7,10,13). Clarifying these contributions to QoL measurement may help explain paradoxical findings in the literature and provide further insight into the discrepancies between clinical measures of health and patients' own evaluations of their health. Additionally, knowledge of response shift and its measurement would lead to design adjustments for the sensitive assessment of QoL longitudinal data, ultimately leading to improved interventions that positively impact patients' lives (2,9,14).
To our knowledge, only one review and one meta-analysis on the evaluation of response shift have been previously conducted and none were exclusively evaluative of cancer populations (7,12). The 2006 meta-analysis examined the presence of response shift in studies assessing all forms of chronic conditions (7), while the 2011 review examined the presence of response shift exclusively in prostate cancer studies (12). This is the first systematic review of response shift that focuses exclusively on cancer studies. The aims of this study are to review the evidence of response shift in studies assessing the QoL of cancer patients by way of examining the methods utilized to assess response shift, the QoL domains assessed and found to be prone to response shift, the length of time between assessments, and types of patient characteristics and external factors that may have contributed to the emergence of a response-shift in these studies.

METHODS
A systematic search of English-language literature using MEDLINE (1946-April 2017), EMBASE (1974-April 2017), and PsychINFO (1887-April 2017) was performed and a total of 1,365 possible articles were obtained, evaluating the presence of response-shift in cancer patients populations where quality of life outcomes were assessed. A manual search of the cited references of the selected articles did not result in additional articles. A second search of articles from April 2017-December 2018 was performed December 2018 using the exact same databases and search terms, resulting in an additional 122 possible articles for a total of 1,487 records identified through database searching. Appendix 1 lists the search strategy performed on MEDLINE as an example of the literature search performed in each database. The search words used to obtain these articles included neoplasms (exploded), cancer * , carcinoma * , malignan * , tumor * , neoplas * , adeno * , matasta * (terms combined using an OR statement), followed by response adj (shift * or change * ), recalibrat * , reprioritiz * or reprioritis * , reconceptualiz * or reconceptualis * (terms combined using an OR statement). Articles of interest included quantitative studies (observational studies, cohort studies, case-control studies, cross-sectional studies) that directly assessed patients of any gender, age, or cancer type on response shift and QoL. Articles without primary data (commentary, letters, reviews, editorials, and methods papers), and dissertations were excluded. Articles that assessed the impact of an intervention on QoL were also excluded.
Information was extracted primarily from the "Results, " "Discussion, " and "Methods" sections with some input from the "Background" section. Extracted information included study characteristics, type of method used to assess response shift, participant characteristics and whether they were assessed in the evaluation of the response shift effects, type and localization of cancer, severity of cancer, time between diagnosis and treatment, time elapsed between assessments, methods and results pertaining to response shift, types of QoL outcomes and an indication as to whether a response shift effect was observed, and the authors' interpretation of results and conclusions. Internal validity was evaluated by examining the study design (blinding, statistical tests, reliability, participant recruitment, study limitations, validity, and biases) and external validity was based on whether or not the sample was representative of the entire population. Effect sizes were evaluated using Cohen's criterion for significance based on differences in means as reported in the studies reviewed. Effect sizes d = < 0.5 were considered small, between 0.5 and 0.8 of moderate effect size and >0.8 were considered large.

RESULTS
After removal of duplicates, 999 articles remained. The electronic records were collected in a Research Information System (RIS) data file. Titles and abstracts of the electronic search results were screened by two authors (JB, LM) to identify the relevant studies. Articles that described observations of cancer patients, and discussed response shift, recalibration, reprioritization, and reconceptualization were then further assessed. One hundred and four articles were selected for full-text review. Further screening of the potentially eligible articles through full-text examination resulted in the exclusion of 69 articles and the selection of only 35 of the remaining articles for final inclusion.
Characteristics for data extraction included study characteristics, sample characteristics, demographics, response shift predictors, QoL outcomes, age at diagnosis, cancer type and localization of cancer, treatment type, and time elapsed between treatment and diagnosis. There were no limitations to the population size, age, or gender.
Two authors (JB and LM) independently evaluated the relevance and quality of the articles in the search and extracted data using data abstraction forms. The STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) criteria for quality assessment and the Ottawa-Newcastle Quality Assessment Scale were applied to evaluate each article on study quality and external and internal validity (15). Agreement between the two raters was very high (Cohen's kappa = 0.86). Results are reported according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines (16). A PRISMA flowchart, shown in Figure 1, was created to demonstrate the number of articles at each stage of data acquisition and the number of articles that were excluded at each stage.

DISCUSSION
This study reviewed the presence and magnitude of response shift in studies assessing cancer patients' QoL over time. Evaluating the presence of response shift during the assessment of QoL measurement among cancer patients and its magnitude is important because it provides a measure of the extent to which the "true" effects of the cancer diagnosis and treatment can be masked by changes in the internal standard of measurement (otherwise assumed to be negligible) during these measurements. Undertaking a review for cancer, separate from that of other chronic conditions is important as QoL among cancer patients is known to be poorer compared to that of other non-cancer chronic conditions, some of which may predispose individuals to cancer (49). Error in QoL measurement attributable to response shift could lead to failure to detect treatment toxicity and side-effects. This is important because when toxicity and side-effects are identified and acted upon through post-treatment interventions the result can lead to a reduction in their harmful effects on patients' quality of life, short and long term (2,7). This review shows that response shift was present in 34 of the 35 studies assessed, although overall, the magnitude of the response shift found was negligible, to small, at best. The studies reviewed here showed large heterogeneity in the types of cancer assessed, patient characteristics and study designs. Among the 35 studies identified patients diagnosed with either breast (18 studies), lung (14 studies), prostate cancer (eight studies), and colorectal (eight studies) were the most commonly assessed populations.
All studies, with the exception of one (very small sample size) which was comprised of children, included older participants (mostly among 50-65 years old). Age, sex, time elapsed since diagnosis, and external factors were assessed in few studies (4, 10, 4, and 7, respectively) and, on average, half the time or less were not found to be statistically significantly contributing to the response shift-effect observed (2, 6, 1, and 5, respectively). The most common method used to assess and find a response-shift effect was the "then-test" (19/21 studies found a small albeit significant effect). Most studies that observed a statistically significant response shift effect had a time between assessments that varied between 3 and 6 months. However, there was significant heterogeneity between the baseline selected among the studies (Tables 1, 3) as some selected a post-diagnosis, pre-treatment baseline, others selected a post-treatment baseline, yet others arbitrarily chose a period of time that elapsed after treatment without controlling for the period of time that elapsed between diagnosis and baseline treatment in their analyses. All except one study (31) reported the presence of a statistically significant response shift in one or more QoL dimensions. In this one study, patients remained stable after surgery and were not stable simply due to response shift because none was observed (31). About a half (16 studies) of the 34 studies that found a statistically significant response shift and reported effect sizes for their results, revealed negligible effect sizes, indicating that overall response shift, while detectable it has a negligible influence on quality of life outcomes whether measured through validated and reliable questionnaires or self-reported answers to questions assessing QoL outcomes (14, 18, 19, 21, 22, 24, 25, 28, 35-37, 39, 40, 43, 44, 46). Among the different QoL subscales, the occurrence of a moderate effect size response shift were evident among four of these 16 studies which detected negligeable to small effect sizes, in the assessment of pain (24), physical limitations (46), global QoL (21,22,24), and social role functioning (46). One study only reported a large effect size response shift, for the assessment of fatigue (17). These results may suggest that response shift may be a phenomenon occurring particularly in measurement of physical aspects of functioning and possibly global QoL, although before any definitive conclusions are drawn, these results need to be replicated with larger sample sizes among homogeneous samples of cancer patients.
There are many reasons why most of the effect sizes reported in the studies we reviewed are small to negligible. One possibility could be the heterogeneity of length of time between their QoL assessments. Considering the possibility that response shift may take a short time to manifest or may become insignificant over time, it may be important to consider these variations when evaluating response shift. Indeed, of the six studies, of moderate, and one study, of large effect size, identified, all [except Broberger et al. (22)] assessed QoL at diagnosis or hospital admission and had a second assessment right after treatment suggesting that internal standards "shifted" within a narrow window of time from diagnosis to immediately post-treatment. Since this "shift, " when identified in other studies, was present but negligible in size, it may be possible that the "shift" may be strongest when assessed within a short time since diagnosis, preferably right after treatment. Future studies should assess if it is possible that response shift may be an artifact of chosen baseline assessment throughout the cancer journey timeline continuum, and length of time elapsed between diagnosis and post-treatment assessment, with longer periods of time leading to loss of strength in the effect identified. Baseline or pre-test assessments were given at different times between the 35 studies reviewed here. For example, baseline of pre-test assessments was administered sometimes at the time of diagnosis, 2-weeks after diagnosis, hospitalization, pre-surgery, right at the start of treatment, post-surgery, discharge, or were not specified. The time post-assessment also varied considerably, as each study selected different times of post-assessment. The majority of studies (16 studies) administered QoL assessment 3 months after baseline whereas nine studies measured 6 months after baseline. Most studies that observed a statistically significant response shift effect size were administered 3 months after baseline, with fewer observing an effect that was present at 6 months. Only four, of the 35, studies assessed the time elapsed between diagnosis and first/baseline testing and controlled for it in their analyses (28,29,33,47). Future studies should consider controlling for this important factor in their analyses as this information has considerable relevance for quality of life outcomes due to expected consequences of specific treatments and for efforts to identify the expected rehabilitative needs of cancer survivors (50).
Furthermore, studies also varied in the length of time elapsed between longitudinal QoL assessments (e.g., 1 week, 3 months, 6 months) but also the number of time points assessed (19 studies measured QoL at multiple times, whereas 16 studies measured QoL only once after baseline assessment). Future studies should consider the inclusion of multiple time assessments to allow for the examination of the presence and strength of response shift over time.
A second reason why effect size, among the few studies reporting it, may have been very negligible, may be related to the heterogeneity of the samples, small number of participants in the samples examined and different methodologies adopted for testing for response shift. Few among the studies reviewed controlled for patient characteristics in their examination of response shift in heterogeneous samples (e.g., patients with various forms of cancer, of different ages, and different stages of cancer). None of the studies reviewed here evaluated the possible contribution of age at cancer diagnosis or race to response shift. Among the patient characteristics that were evaluated in the papers we identified, and reviewed, treatment type appeared to be the most influential contributor (found significant in eight of 10 studies that evaluated it) of response shift among cancer patients. None of the reviewed studies examined the contribution of perceived social support to response shift. Given that the relationship between social support and QoL is wellestablished (51)(52)(53)(54), where social support is associated with improved QoL and is shown to influence the patients' level of perceived distress related to their cancer diagnosis, which in turn may alter their evaluation of their outcomes, future studies should consider controlling for its contribution to the presence or absence of response shift in patient reported outcomes (12,(51)(52)(53)(54). Other important factors such as cancer type (found to be a statistically significant contributor in one of the three studies that reported it) and stage (a significant contributor in one of the five studies who examined it), the presence of comorbidities (found significant in two of the three studies who assessed it), occupation (evaluated in two studies), and marital status (found to be statistically significant in two of three studies who evaluated it) which have been documented to be associated with QoL outcomes among various cancer types, should be considered as possible confounds and included in future studies evaluating response shift given their considerable relevance to QoL outcomes among cancer patients due to efforts to identify modifiable and non-modifiable life factors in better survivorship (50). Lastly, we note the lack of standardization in the measurement and reporting of response shift in the studies reviewed here. Study designs in the 35 studies we reviewed included the "then" test (21 studies of the 35) and the "pretest and post-test" (12 studies) methodology predominantly, with four other less adopted methodologies. Currently, there is still much debate surrounding the appropriate methodology for measuring response shift, and which statistical tests to use to analyze the data, which instruments accurately capture QoL, and what information should be recorded by researchers (4,8,31,(55)(56)(57). A standard method for collecting and reporting response shift data will aid the scientific community to justly determine whether the phenomenon of response shift exists, or if it is simply a methodological artifact.
While the presence of response shift of internal QoL standards among cancer patients may reduce the actual effect size of the QoL changes observed in longitudinal studies from one time point to another, the present review found small to negligible evidence to support its influence. An ideal methodology for assessing response shift in QoL measurement would be to include a time point assessment before diagnosis and compare it to postdiagnosis and post-treatment responses. Interestingly, one such assessment by Broberger et al., which was performed at 2-4 months before lung cancer diagnosis, found no decisive support for the hypothesis that a change in internal standards occurred in this group of patients (22). The explanation for the lack of response shift may be that patients would have adapted to the symptoms of their diagnosis at least to some extent prior to their diagnosis, or that the "shift" some studies observe may be part of the normal life fluctuations some people may experience rather than a consistent and stable phenomenon that is event bound (e.g., cancer diagnosis). Therefore, "response shift" may be capturing people's natural psychological adaptations to life circumstances which most generally eventually succumb to what we know as "regression to the mean" (58). This concept may be described as the process whereby changes in internal states may fluctuate, and go up or down depending on what life events an individual has to face from one given point in time to another, but that eventually they regress toward whatever may consist as the "average" response based on the internal states which generally define this individual (58).
Detecting unbiased or "real" cancer treatment effects is crucial not only to help fine-tune interventions, their administered length of time and intensity dosages, to inform patient education and empowerment programs in order to reduce negative side effects and improve patients' quality of life, but to also identify extreme (weather positive-resilience, or negativeextreme vulnerability) psychological adaptations to treatments that often challenge people's sense of identity. Since attrition in longitudinal studies may lead to loss of severely ill patients from the original sample, it, as opposed to response shift, may explain why QoL outcomes are relatively high in cancer patient groups reflecting the better health scores of the remaining group's members. Future studies using large sample sizes and better designed methodologies may contribute to a deeper understanding of whether response shift may be one of several factors influencing QoL assessments in light of changing life circumstances.
Given that the patient population samples in most studies reviewed here were heterogeneous with wide varieties of treatments, length of time between diagnosis and QoL assessments, treatment schedules, and cancer specific and demographic characteristics that were more often not accounted for in the analyses, response shift studies should be considered more in the hypothesis generating spectrum, until more studies are conducted to account for these limitations. The knowledge that a decrease in QoL outcomes post-treatment may be underestimated by a small amount should also be seen in light of the on-going discussion on the issue of clinically relevant changes (59).
It must be noted that this review is not without its limitations. First of all, the studies included in this systematic review were identified via electronic searches of three databases (MEDLINE, EMBASE, and PsychINFO) plus a manual review of the reference section of selected papers. It is possible that relevant articles pertaining to this review could have been missed using the aforementioned methods. Moreover, more than half of the studies in this review involved the then-test method, which is known to be susceptible to recall bias. For example, Litwin and McGuigan examined recall bias in men treated for prostate cancer and found inaccuracies in pre-treatment outcomes recall (56). Korfage et al. also argue that the use of general health QoL measures (e.g., SF-36, EQ-5D, EORTC-C30) may not be ideal for accurately measuring patient-reported health because the generic measures may not include questions on important disease-specific side effects (60). For example, sexual, urinary and/or bowel dysfunctions experienced by prostate cancer survivors post-treatment are not well-captured by generic general health QoL measures, although specific measures of these conditions (e.g., IPSS-The International Prostate Symptom Score) are successful at capturing poor QoL in these domains in this population (55)(56)(57)60). Donohoe also hypothesizes that high levels of social support may lead to better adaptation to the cancer diagnosis and its side-effects, which would present itself as a response shift. These external factors are often not taken into consideration because many of the general health QoL measures do not have questions assessing them (12). Therefore, response shift studies that use generic measures may reflect the measures used rather than accurate changes in perceived outcomes. Thus, given the large number of then-test studies in this review, the results should be interpreted within caution (8).
Lastly, aggregating the studies we reviewed to compute a pooled effect size was not possible given the heterogeneity of study designs and measures included in this review. Lastly, this review on response shift focused solely on cancer patient samples, which limit its generalizability to other chronically ill patient populations. Thus, future studies are needed to replicate these effects with larger sample sizes while controlling for possible sample characteristics confounds before these results should be considered generalizable. At the time of this review, the clinical significance of response shift on QoL outcome measurements was still being elucidated, with inconsistent findings stemming from individual studies and an indefinite conclusion being reported by a previous meta-analysis (7) and the current review.