Psychometric properties of the German Version of the Psychological Consequences of Screening Questionnaire (PCQ)

This study aimed to translate the negative and positive items of the Psychological Consequences Questionnaire (PCQ) into German, to adapt this version to the context of screening for cirrhosis of the liver, and to test its psychometric properties. The three subscales and were translated into using forward-backward translation we We tested the and using of 443 who were of liver. and higher for both the (α PCQ (α (α = factor analysis that the separately. (STAI-Y-6)


Background
Medical screening is increasingly used in any medical discipline to detect the onset of diseases and prevent severe progression. Thus, screenings are commonly used in a population that has no acute symptoms. In Germany, a screening procedure called "Gesundheits-Check-Up" (health check-up) that aims to detect risk factors for certain diseases is available for statutory health insurance members aged 35 and higher and includes a blood and urine test [1]. A newly introduced screening procedure for the early detection of cirrhosis or brosis of the liver is tested in the context of the ongoing SEAL programme (SEAL -Structured early detection of asymptomatic cirrhosis of the liver in Rhineland-Palatia and Saarland) in two German Federal States (Rhineland-Palatia and Saarland) since January 2017 [2]. This programme integrates the screening in the "Gesundheits-Check-Up" routine which can be repeated every three years.
Beside the bene ts of early detection of diseases, such as early treatment and potential prevention, negative effects should also be taken into consideration when evaluating the role of screenings [1,3]. In a review by Harris [3] potential harms that cross multiple conditions, e.g. nonadherence, overdiagnosis and targeted screening are discussed. Landstra et al. [4] contribute to this discussion by outlining that unrealistic optimism caused by negative screening results can lead to lower anxiety and thus to a reduction in health-promoting behaviours. A dilemma that occurs when introducing broad screening programmes is that they tend to have low predictive power resulting in a high rate of false-positive results. Studies have shown that false-positive screening results have substantial negative psychosocial consequences [5,[7][8][9]. However, not only false-positive results can have that impact. Intensive surveillance during the screening process itself can produce unfavourable side effects on psychological well-being and health-related quality of life due to the confrontation with a potential threat [6]. In order to identify potential burden in line with the screening, we conducted a cross-sectional survey alongside the SEAL study with screened participants.
Brodersen et al. [5] emphasize the need for patient-reported outcome measures (PRO) with high content validity in order to systematically investigate psychosocial consequences of screening. To the best of our knowledge, no instrument exists so far to measure the psychological impact of screening for liver diseases, particularly not in German language. Therefore, we chose to adapt a questionnaire that was initially developed in the context of breast cancer screening. The Psychological Consequences Questionnaire (PCQ) was created 1992 in Australia and aims at measuring the positive and negative effects of breast cancer screening on emotional, physical and social functioning [7]. To date, the PCQ was broadly used in measuring short-term screening impact in the context of mammography screening [6][7][8][9][10][11] and was also adapted to other types of cancer such as colorectal [12], anal [13] and skin cancer [14]. Cross-cultural adaptations produced translated versions of the PCQ in Dutch [6] and Danish [15] indicating a high usability of the instrument. To best of our knowledge, no German version of the PCQ exists so far. We also found no attempt for an adaptation of the PCQ beyond cancer diseases in the literature. However, we assume that liver cirrhosis is comparable to cancer in terms of the life-threatening perception of the population and thus consider the adaptation of the PCQ as suitable in this context. Consistent with the translation into Danish [15], we adapted not only the negative but also the positive items of the PCQ. The original PCQ comprises the three emotional, psychological and social dimensions, both measured by negative and positive items. Emotional consequences are measured by ve negative items and ve positive items. Four negative and three positive items measure the psychological aspects and three negative and two positive items picture the social dimension (see Table 1). All items were rated on a 4-point scale ranging from 0 to 3. The exact scale labels differed slightly between negative and positive items and were 0 "not at all", 1 "rarely" and "a little bit", 2 "some of the time" and "quite a bit" and 3 "quite a lot of the time" and "a great deal". They can be added to scores for each dimension after recoding the positive consequences to express the level of dysfunction. Along with suggestions in the literature, we tested an overall PCQ score as a single concept of adverse psychological consequences as well [6,8].
The purpose of this study is threefold: 1. to translate the PCQ into German 2. to adapt it to the context of liver diseases 3. to assess acceptability, internal consistency, scale structure and validity of the questionnaire in a screening population

Methods
German version of the PCQ In a rst step, we slightly rephrased the items to t to the context of liver screening. We then applied forward and backward translation procedures for all PCQ items [16]. First, two German native speakers with uent command of English and social science background independently translated the PCQ into German and found consent on their versions. Second, an English native speaker with uent command of German and a bilingual professional translator translated the German text back into English. After minor revisions, the team found consent on a German version which was further discussed in a group of German native speaking Social Scientists (n = 5) with expertise in survey design.

Study population and data collection
The ongoing SEAL study is a prospective study that aims at evaluating a newly introduced medical screening method for early diagnosis of cirrhosis or brosis of the liver. Since May 2018, patients who visit collaborating clinics or doctor's o ces in Rhineland-Palatia or Saarland for a check-up were screened for liver cirrhosis and brosis. The screening process itself is a multistep design depending on the test result of each step (step 1: blood sample test and risk score, step 2: enhanced laboratory diagnostics and ultrasound, step 3: liver biopsy and enhanced diagnostics in a specialized clinic. Inclusion criteria for study participation were a minimum age of 35 and no known previous cirrhosis of liver [2].
In August 2019, we contacted all patients who were included in the study so far. Due to ethical considerations, it was not possible to assess information on the phase in which the patients were in the whole screening process. This means that we could not gather information on whether patients with positive screening results already moved to step 2 or 3 or whether they still wait for an appointment.
Therefore we could not control for false-positive results and potential effects caused by them. Since screening began in May 2018, we assume that the majority of the patients already received information on the test results. However, previous qualitative interviews revealed that patients, in general, receive no information in case of negative test results (results that showed no pathological ndings) at all. A gross sample of 5.935 patients received a postal mail including a self-administered questionnaire, patient information and informed consent. With a return rate of 9% in those who were negatively screened and a return rate of 12% in those who were positively screened, we received 499 (negatively screened) respectively 21 (positively screened) completed questionnaires. In some cases, signed informed consent for our survey was missing. After a subsequent acquisition of missing consent documents, we excluded those cases that were not legitimate for evaluation (n = 34). We ended up with a net sample of 487 patient questionnaires (n = 19 positively screened and n = 468 negatively screened). For this analysis, we included only cases with at least an 80% response rate to the PCQ items. This means we excluded cases that had more than three missing items on the negative PCQ scales and more than two missing responses on the positive PCQ scales. To increase comparability, we followed the approach of Rijnsburger et al. and imputed median scores per item in eligible questionnaires [6]. Furthermore, in case of non-unique answers, we treated those as missing values. For psychometric analysis we ended up in an analysis sample of 443 cases.

Statistical Analyses
For cross-cultural comparability, the analyses follow the suggestions of Rijnsburger et al. [6]. However, we slightly extended our psychometric test procedures according to Schupp et al. [17] to reach a better understanding of the psychometric properties of the PCQ. The analyses were conducted using IBM SPSS Statistics 26.

Distribution Properties
To give insights into the distribution properties, we computed skewness and kurtosis. Non-response rates and double cross rates (non-unique answers) for each item were shown to understand acceptability of the scale. Additionally, items with high skewness or kurtosis as well as items that show ceiling or oor effects were determined. We followed the classi cation for ceiling and oor effects as suggested by McHorney & Tarlov, Varni et al. and Lin et al. which suggests a percentage of 0 to 15% as small, 16 to 30% as moderate and more than 30% as substantial oor or ceiling effect [18][19][20]. Furthermore, we showed the 25th, 50th and 75th percentiles of the subscales and the total PCQ.

Internal consistency
We computed Cronbach's Alpha for the PCQ scales to evaluate internal consistency. We also assessed item-total correlation as well as mean-inter-item correlation for each subscale. Furthermore, we tested whether Cronbach's Alpha allows computing an overall PCQ including both negative and (recoded) positive items.

Scale structure
We conducted a con rmatory factor analysis (CFA) using IBM SPSS AMOS 24 software to con rm the item-factor-relationship. Since the PCQ is assumed to cover three dimensions, a three-factor solution was tested based on Maximum Likelihood estimation. Model t was evaluated using Chi² goodness of t test, the Comparative Fit Index (CFI) [21], Tucker-Lewis Index (TLI) [22], root mean square error of approximation (RMSEA) and standardized root mean square residual (SRMR). A non-signi cant Chi² test indicates that the assumed model ts the data well. We consider CFI and TLI values above 0.90 as an indication of a good t. RMSEA values below 0.1 are considered as moderate and values below 0.05 as good t [19].

Convergent validity
To evaluate convergent validity, we correlated the PCQ subscales with the Spielberger State-Trait Anxiety Inventory (STAI Short version Y-6) [23]. This scale is designed to measure current emotional status and is assumed to correlate high (Pearson's r greater than 0.5) with all three subscales of the PCQ. We further hypothesize that the highest correlation to be found is with the emotional subscale of the PCQ. If this hypothesis cannot be rejected, we conclude acceptable convergent validity of the German version of the PCQ. Additionally, we assume that the STAI-Y-6 scale correlates stronger with the negative PCQ subscales than with the (recoded) positive PCQ subscales. This assumption is based on the rationale that the absence of a positive effect (e.g. not experiencing greater well-being) should not be considered as equal to the occurrence of a negative effect. This circumstance led Brodersen & Thorsen to rephrasing the positive items, so that they allow changes in both directions [15].

Acceptability
We noticed low unit-non response for all PCQ items in our sample. Following the approach explained above, we had to exclude 15 cases for the negative PCQ items and 66 cases for the positive PCQ items since they show less than 80% of overall scale response. This pattern illustrates that the general nonresponse rate is higher for the positive PCQ items (ranging from 3.3-8.8%) than for the negative items (ranging from 2.3-3.9%) (see Table 1). We could not identify any non-response pattern that is associated with a speci c subscale. Non-unique answers were generally low and were not observed more than three times per item (highest non-unique answer rate of 0.6% in item "feeling more able to do things which I normally did before").

Distribution properties
For all negative PCQ items we found mean values ranging between the categories "0 Not at all" and "1 Rarely" indicating that the impact of screening is generally low. This pattern also explains a substantial oor effect for all negative items with more than 50% of responses lying in the undermost category. In contrast, we found small ceiling effects with the highest percentage of values in the item "I felt worried about my future". As consequence, the data is highly skewed. Z-standardised skewness exceeds the cutoff value of + 1.95 for any negative PCQ item. Regarding kurtosis, items 2, 4, 6, 9, 10, 11 and 12 lie beyond the range of-1.95 to + 1.95 indicating no normal distribution.
The positive PCQ items generally show higher mean values ranging from 0.88 to 2.01 indicating a tendency towards a higher item agreement to positive effects of the screening. This pattern is in line with fewer oor effects for the positive items in comparison to the negative items, so that substantial oor effects were found for item 13, 14, 15, 21 and 22, moderate effects were found for items 18, 19 and 20 and small oor effects were found for items 16 and 17. Substantial ceiling effects were only found for item 16, apart from that, the ceiling effects for the positive PCQ scales can be considered as small to moderate in general. Regarding skewness and kurtosis, the positive items show values closer to zero than the negative items. However, most of them exceed the cut-off values +/-1.96, indicating no normal distribution of the data. One should notice that all positive PCQ items show negative signs for Kurtosis.

Internal consistency
For this section and the subsequent analyses, we applied missing value imputation as stated above. Furthermore, we recoded the positive items to compute the subscale indices so that higher scale values indicate less positive impact. In general, we found acceptable Cronbach's alpha higher than 0.7, except for the overall social subscale (α = 0.58) (see Table 2). For the physical and the social subscale higher Cronbach's alpha values can be found if they were treated separately (positive and negative subscale). For the emotional dimension we found a small decrease for the overall scale compared the separate scales, which, however, still shows high reliability (α = 0.84). This pattern is also re ected in the mean inter-item correlation. In general, the overall subscales show lower values than if treated separately. Regarding the total PCQ scales (positive, negative, total), we found excellent Cronbach's alpha values of 0.9 and higher indicating good consistency. For the negative subscales and the total negative PCQ scale, we found strong deviation between the scale mean and the median indicating that the majority of our sample is not affected strongly in general, but bias by few extreme values occurs. Table 3 shows the model t statistics for three con rmatory factor analyses we computed. The rst row represents the results for the negative PCQ model (all three negative subscales included), the second row for the positive PCQ model (all three positive subscales included) and the third row for the total PCQ model (all subscales included). For each CFA model, we conducted a modi cation based on the suggestions by modi cation indices and whether they were theoretically plausible. In general, no model showed a non-signi cant Chi² test. However, it is considered too sensitive for large sample sizes [24] and we could at least reach a decrease of 50% respectively 66% in Chi² value by modi cation. In all three models, we could improve the model t expressed in an increase of the TLI and CFI as well as a decrease in RMSEA. While the modi ed CFA model for the negative PCQ subscales (TLI = 0.97, CFI = 0.98, RMSEA = 0.07) and the modi ed CFA model for positive PCQ subscales (TLI = 0.95, CFI = 0.97, RMSEA = 0.08) show quite good model t, the total PCQ model performs badly even after modi cation. A closer look at the modi cation indices led to the assumption that some items are not selective enough and correlate with multiple concepts. Modi cation indices suggested a correlation between error terms of items 10 and 11 (physical subscale) with item 9 (social subscale) and the emotional subscale itself. Item 10 refers to di culties doing things at home and item 11 refers to di culties meeting work or other commitments. Item 9 refers to withdrawing from close people re ecting a dimension of social isolation. What those entire items share in common is that they are typical symptoms of depression. We consider that as a prime example of the bio-psycho-social model [25] which indicates the di culty of separating those three dimensions from each other.

Convergent validity
As exposed in Table 4, we have to reject our hypotheses partially. Only the emotional subscale shows a strong correlation (r = 0.53) with the STAI-Y-6. The physical and social overall subscales show moderate positive correlations with the STAI-Y-6 ranging from 0.40 to 0.42. However, the total PCQ scale ts the hypothesis so that a strong correlation of 0.52 can be found. As we assumed, all positive subscales correlate less strongly with the STAI-Y-6 than the negative subscales do. Since we also found evidence for the assumption that the emotional subscale is stronger related to the STAI-Y-6 than the other subscales are, we overall can assume acceptable convergent validity of the PCQ.

Discussion
Overall, our study results report a successful adaptation of the German PCQ with good performance in terms of acceptability, internal consistency, scale structure and convergent validity. The non-response rates and non-unique answer rates were negligible. Our results were comparable to other adaptations of the PCQ [6,15] and demonstrate that the PCQ is not only useful for the setting of cancer diseases.
However, we found substantial oor effects, especially for the negative PCQ items. Floor effects pose a potential risk in terms of accuracy of a scale since one has to assume that some kind of variation is drawn together in the lowest category. In general, we would suggest further differentiating the response categories, but this would also have as consequence that homogeneity of measurement across studies and countries would suffer. Since the lowest category is "not at all" one can assume that the PCQ produces oor effects if the population simply does not experience the eligible dysfunction. Here, it seems that the overall psychosocial impact of the screening is quite low.
Reliability analyses suggest not summing up the physical, social and emotional subscales for positive and negative items. Moreover, the separation of the two scales should be applied as intended by Cockburn et al. and con rmed by Ong et al. into a negative and a positive subscale [7,8].
Because of the cut-off value of 80% which we chose as a criterion for too incomplete scales for analysis, we had to exclude 29 cases only because of the positive subscales and four cases only because of the negative items. In line with a generally higher non-response rate of the positive items, we assume that here fatigue effects might have occurred. The positive items were placed as a block after the negative items in our questionnaire at the end of the page. Since our study could not apply randomization of items, we recommend further testing of the German version of the PCQ including tests for order effects.
Another limitation of our study design was that we could not contact the patients in a speci c period after their screening so that some patients have longer periods between the screening experience and responding to the questionnaire than other ones. An individual contacting procedure (e.g. 4 weeks after each test) would improve the comparability of the measurement, but was not realistic to implement in this study. We recommend further testing the German PCQ in a controlled setting, which could also offer the possibility to get insights into retest reliability and the sensitivity of measuring changes over time. This could shed light on the progress of potential burden after a screening experience.
The general low response rate to our study limits representativeness of our results. After the delivery of the questionnaire, we received some phone calls of patients who were not aware of their inclusion into this study. Cognitive impairment and lingual problems also were named as reasons for non-response to our questionnaires.
Due to the weak performance of the modi ed CFA model for the total PCQ and because of the shared variance of items 9, 10, 11 and the emotional subscale, a pattern that occurs only in the total PCQ model, we recommend being careful in using the total PCQ score and instead computing the negative and the positive PCQ separately.
Since the negative PCQ subscale and the total PCQ correlated at least moderately with the STAI-Y-6, we consider the convergent validity as acceptable. However, more insights into other aspects of validity would help to further evaluate the performance of the PCQ. In other validation studies, the Impact of Event Scale (IES) and the Hospital Anxiety and Depression Scale (HADS) were used for validation. In our project, the IES seemed not to be adequate since most of our participants were screened negatively. Thus the considered event "liver cirrhosis" is not as present as it is in the context of the breast cancer screening study of Rijnsburger et al. [6]. The HADS was also not considered to be useful in our context since it contains a longer list of items than the short form of the STAI. However, for the future implementation of the PCQ we recommend also considering the HADS or IES to enhance international comparability of the German version of the PCQ.

Conclusions
Overall, our study results report a successful adaptation of the German PCQ with good performance in terms of acceptability, internal consistency, scale structure, and convergent validity. We could demonstrate that the German version of the PCQ is a useful and well-performing measurement for both negative and positive screening consequences, even in a non-cancer setting.