Psychometric properties of the German version of the Psychological Consequences of Screening Questionnaire (PCQ) for liver diseases

Background This study aimed to translate the negative and positive items of the Psychological Consequences Questionnaire (PCQ) into German, to adapt this version to the context of screening for cirrhosis and fibrosis of the liver, and to test its psychometric properties. Materials and methods The three subscales (physical, emotional, and social) were translated into German using a forward-backward translation method. Furthermore, we adapted the wording to the context of liver diseases. In sum, the PCQ comprises twelve negative items and ten positive items. We tested the acceptability, distribution properties, internal consistency, scale structure, and the convergent validity using an analysis sample of 443 patients who were screened for cirrhosis or fibrosis of the liver. Results We found low non-response and non-unique answer rates on the PCQ items in general. However, positive items had higher non-response rates. All items showed strong floor effects. McDonald’s Omega was high for both the negative (ω = 0.95) and the positive PCQ scale (ω = 0.90), as well as for the total PCQ scale (ω = 0.86). Confirmatory factor analysis could reproduce the three dimensions that the PCQ intends to measure. However, it suggests not summing up a total PCQ score and instead treat the subscales separately considering a higher order overall construct. Convergent validity with the short form of the Spielberger State-Trait Anxiety Inventory (STAI-Y-6) was acceptable. Conclusion Overall, our study results report a successful adaptation of the German PCQ with good performance in terms of acceptability, internal consistency, scale structure, and convergent validity. Floor-effects limit the content validity of the PCQ, which needs to be addressed in future research. However, the German version of the PCQ is a useful measurement for both negative and positive screening consequences - even in a non-cancer setting.


Introduction
Medical screening is increasingly used in any medical discipline to detect the onset of diseases and prevent severe progression. Thus, screenings are commonly used in a population that has no acute symptoms. In Germany, a screening procedure called "Gesundheits-Check-Up" (health check-up) that aims to detect risk factors for certain diseases is available for statutory health insurance members aged 35 and higher and includes a blood and urine test (Institute for Quality and Efficiency in Health Care, 2022). A newly introduced screening procedure using a calculated ratio for the early detection of cirrhosis or fibrosis of the liver was tested in the context of the SEAL program (SEAL -Structured early detection of asymptomatic cirrhosis of the liver in Rhineland-Palatinate and Saarland) in two German Federal States (Rhineland-Palatinate and Saarland) from January 2017 to October 2021 (Nagel et al., 2019;Labenz et al., 2022). This program integrates the screening in the "Gesundheits-Check-Up" routine, which can be repeated every three years. The results of screening depend on cut-off values that have to be pre-defined. For the SEAL program, a cut-off value of the Aspartate aminotransferase to Platelet Ratio (APRI) of 0.5 was chosen. Thus, a positive screening rate of 3.5 to 4.0% is expected for the SEAL cohort with a false-positive screening rate of 70-80% (Unalp-Arida and Ruhl, 2017; Yip and Wong, 2017).
Beside the benefits of early detection of diseases, such as early treatment and potential prevention, negative effects should also be taken into consideration when evaluating the role of screenings (Harris, 2011;Institute for Quality and Efficiency in Health Care, 2022). In a review by Harris (2011), Abbreviations: PCQ, psychological consequences questionnaire; SEAL, structured early detection of asymptomatic cirrhosis of the liver in Rhineland-Palatinate and Saarland; APRI, aspartate aminotransferase to platelet ratio; PRO, patient-reported outcome; CFA, confirmatory factor analysis; WLSMV, weighted least square mean and variance adjusted; DWLS, diagonally weighted least squares; CFI, comparative fit index; TLI, Tucker-Lewis index; RMSEA, root mean square error of approximation; SRMR, standardized root mean square residual; STAI-Y-6, Spielberger state-trait anxiety inventory (Short Form); IES, Impact of event scale; HADS, Hospital anxiety and depression scale. potential harms that cross multiple conditions, e.g., nonadherence, overdiagnosis and targeted screening are discussed. Landstra et al. (2013) contribute to this discussion by outlining that unrealistic optimism caused by negative screening results can lead to lower anxiety and thus to a reduction in health-promoting behaviors. A dilemma that occurs when introducing broad screening programs is that they tend to have low predictive power resulting in a high rate of falsepositive results. Studies have shown that false-positive screening results have substantial negative psychosocial consequences (Cockburn et al., 1992;Ong et al., 1997;Brett et al., 1998;Brodersen et al., 2007). However, not only false-positive results can have that impact. Intensive surveillance during the screening process itself can produce unfavorable side effects on psychological well-being and health-related quality of life due to the confrontation with a potential threat (Rijnsburger et al., 2006). In order to identify potential burden in line with the screening, we conducted a cross-sectional survey alongside the SEAL study with screened participants. Brodersen et al. (2007) emphasize the need for patientreported outcome measures (PRO) with high content validity in order to systematically investigate psychosocial consequences of screening. To the best of our knowledge, no instrument exists so far to measure the psychological impact of screening for liver diseases, particularly not in German language. Therefore, we chose to adapt a questionnaire that was initially developed in the context of breast cancer screening. The Psychological Consequences Questionnaire (PCQ) was created 1992 in Australia and aims at measuring the positive and negative effects of breast cancer screening on emotional, physical and social functioning (Cockburn et al., 1992). To date, the PCQ was broadly used in measuring short-term screening impact in the context of mammography screening (Cockburn et al., 1992;Ong et al., 1997;Brett et al., 1998;Lowe et al., 1999;Bowland et al., 2003;Rijnsburger et al., 2006) and was also adapted to other types of cancer such as colorectal (Denters et al., 2013), anal (Tinmouth et al., 2011) and skin cancer (Risica et al., 2018). Cross-cultural adaptations produced translated versions of the PCQ in Dutch (Rijnsburger et al., 2006), Swedish (Olsson et al., 1999) and Danish (Brodersen and Thorsen, 2003) indicating a high usability of the instrument. We found no attempt for an adaptation of the PCQ beyond cancer diseases in the literature. However, we assume that liver cirrhosis is comparable to cancer in terms of the life-threatening perception of the population and thus consider the adaptation of the PCQ as suitable in this context. Consistent with the translation into Danish (Brodersen and Thorsen, 2003), we adapted not only the negative but also the positive items of the PCQ, but refrained from adding new items, as suggested by the authors in a later published work (Brodersen and Thorsen, 2008). The original PCQ covers three domains of normal functioning: physical, emotional and social functioning, both measured by negative and positive items. Emotional consequences are measured by five negative items (I3, I4, I5, I6, I12) and five positive items (I13, I14, I18, I19, I22). Four negative (I1, I2, I10, I11) and three positive items (I16, I17, I21) measure the physical aspects and three negative (I7, I8, I9) and two positive items (I15, I20) picture the social dimension (see Table 1). All items were rated on a 4-point scale ranging from 0 to 3. The exact scale labels differed slightly between negative and positive items and were 0 "not at all", 1 "rarely" and "a little bit", 2 "some of the time" and "quite a bit" and 3 "quite a lot of the time" and "a great deal." They can be added to scores for each dimension after recoding the positive consequences to express the level of dysfunction.
We found different reporting regarding the underlying factor structure of the PCQ. Olsson et al. (1999) and Rijnsburger et al. (2006) applied principal component analysis and identified three separate factors, one for each of the three dimensions. However, Rijnsburger et al. found only slightly higher itemown correlations of the PCQ subscales than item-other scale correlations indicating a certain overlap of the factors (Rijnsburger et al., 2006). Ong et al. (1997) also applied a factor analysis to examine the scale structure. Due to high cross correlations, they concluded that a one-factor-solution is more suitable for the PCQ. Those inconsistent findings motivated Cooper & Aucote to apply a confirmatory factor analysis (CFA) to identify the optimal factor structure. Their study found support for a one-factor solution.
So far, the majority of studies, including the most recent validation study of Cooper & Aucote, only focused on the negative PCQ scales (Cooper and Aucote, 2009). In our study, we aim to fill this gap by including also the positive subscale of the PCQ in the psychometric evaluation.
The aims of this study are as follows: 1. to translate the PCQ into German 2. to adapt it to the context of liver diseases 3. to assess acceptability, internal consistency, scale structure and convergent validity of the questionnaire in a screening population By this, we aimed to produce an instrument that measures psychosocial consequences of screening in a liver disease context with acceptable psychometric properties.

Materials and methods
German version of the Psychological Consequences of Screening Questionnaire (PCQ) In a first step, we slightly rephrased the items to fit to the context of liver screening. We then applied forward and backward translation procedures for all PCQ items. This method is demonstrated to be equivalent to the dual panel translation method for quality of life outcome instruments (Lee et al., 2019). First, two German native speakers with fluent command of English and social science background independently translated the PCQ into German and found consensus on their versions. Second, an English native speaker with fluent command of German and a bilingual professional translator translated the German text back into English. After minor revisions, the team found consent on a German version which was further discussed in a group of German native speaking Social Scientists (n = 5) with expertise in survey design. This expert assessment is strongly recommended to ensure high equivalence of the instrument (Lee et al., 2019).

Study population and data collection
The SEAL program is a prospective study that aimed at evaluating a newly introduced medical screening method for early diagnosis of cirrhosis or fibrosis of the liver. From January 2018 to February 2021, patients who visited collaborating clinics or doctor's offices in Rhineland-Palatinate or Saarland for a check-up were screened for liver cirrhosis and fibrosis. The screening process itself is a multistep design depending on the test result of each step (step 1: blood sample test and risk score, step 2: enhanced laboratory diagnostics and ultrasound, step 3: liver biopsy and enhanced diagnostics in a specialized clinic. Inclusion criteria for study participation were a minimum age of 35 and no known previous cirrhosis of liver (Nagel et al., 2019(Nagel et al., , 2020. In August 2019, we contacted all patients who were included in the study so far. Due to ethical considerations, it was not possible to assess information on the phase in which the patients were in the whole screening process. This means that we could not gather information on whether patients with positive screening results already moved to step 2 or 3 or whether they still wait for an appointment. Therefore we could not control for false-positive results and potential effects caused by them. Since screening began in January 2018, it is plausible to assume, that the majority of the patients already received information on the test results. However, previous qualitative interviews revealed that patients, in general, receive no information in case of negative test results (results that showed no pathological   findings) at all. A gross sample of 5.935 patients received a postal mail including a self-administered questionnaire, patient information and informed consent. With a return rate of 9% in those who were negatively screened and a return rate of 12% in those who were positively screened, we received 499 (negatively screened) respectively 21 (positively screened) completed questionnaires. In some cases, signed informed consent for our survey was missing. After a subsequent acquisition of missing consent documents, we excluded those cases that were not legitimate for evaluation (n = 34). We ended up with a net sample of 487 patient questionnaires (n = 19 positively screened and n = 468 negatively screened). For this analysis, we included only cases with at least an 80% response rate to the PCQ items. This means we excluded cases that had more than three missing items on the negative PCQ scales and more than two missing responses on the positive PCQ scales. To increase comparability, we followed the approach of Rijnsburger et al. (2006)

Statistical analyses
For cross-cultural comparability, the analyses mainly followed the procedures of Rijnsburger et al. (2006). However, we slightly extended our psychometric test strategy according to Schupp et al. (2018) and Hayes and Coutts (2020) to reach a better understanding of the psychometric properties of the PCQ. Furthermore, we applied CFA using weighted least square mean and variance adjusted (WLSMV) estimators treating the four response categories as categorical. This method is more suitable in case of violation of normality assumption, which is the case here (Li, 2016). The analyses were conducted in IBM SPSS Statistics 26 and R using the lavaan package (Rosseel, 2012).

Distribution properties
To give insights into the distribution properties, we computed skewness and kurtosis. Non-response rates and double cross rates (non-unique answers) for each item were shown to understand acceptability of the scale. Additionally, items with high skewness or kurtosis as well as items that show ceiling or floor effects were identified. We followed the classification for ceiling and floor effects as suggested by MacHorney & Tarlov, Varni et al. and Lin et al. which suggests a percentage of 0 to 15% as small, 16 to 30% as moderate and more than 30% as substantial floor or ceiling effect (MacHorney and Tarlov, 1995;Varni et al., 1999;Lin et al., 2013). Furthermore, we showed the 25th, 50th and 75th percentiles of the subscales and the total PCQ.

Internal consistency
We computed McDonald's Omega for the PCQ scales to evaluate internal consistency (Hayes and Coutts, 2020). Here, we deviate from the work of other earlier PCQ adaptations, where Cronbach's Alpha was used. However, since there is a strong criticism about the use of alpha and since a congeneric model shows more realistic results, we decided to follow another approach (Dunn et al., 2014). We also assessed item-total correlation as well as mean-inter-item correlation for each subscale. Furthermore, we tested whether McDonald's Omega allows computing an overall PCQ including both negative and (recoded) positive items.

Scale structure
To confirm the prior findings in the literature, we both tested a three-factor solution, as well as an overall PCQ score as a single concept of adverse psychosocial consequences. Furthermore, we applied this testing rationale both for the negative and positive items. Since latter received little attention in the past, we also tested, whether an overall consideration of both the negative and positive PCQ items together is purposeful.
We conducted confirmatory factor analyses (CFA) to confirm the item-factor-relationship. Due to the controversial debate about the three-or one-factor-structure of the PCQ, we applied a four-step approach. First, we estimated a threefactor-solution not allowing covariance between the factors (strict model). Second, we modeled one-factor solutions. Third, we allowed covariances between the three dimensions and modeled a higher order factor model with three latent factors (physical, emotional, social) on level 1 and an overall PCQ factor on level 2. Fourth, we checked modification indices for plausibility and estimated a modified higher order model.
In detail, WLSMV estimator was used to compute robust standard errors and a mean-and variance-adjusted test-statistic. Model fit was evaluated using Chi 2 goodness of fit test, robust Comparative Fit Index (CFI) (Hu and Bentler, 1999), robust Tucker-Lewis Index (TLI) (Tucker and Lewis, 1973), robust root mean square error of approximation (RMSEA) and robust standardized root mean square residual (SRMR) to determine local fit. A Chi 2 /df ratio below 3 indicates that the assumed model acceptably fits the data well (Moosbrugger and Kelava, 2020). We consider CFI and TLI values above 0.90 as an indication of a good fit (Moosbrugger and Kelava, 2020). RMSEA values below 0.1 are considered as moderate and values below 0.05 as good fit (Lin et al., 2013). SRMR values below 0.10 were considered as acceptable fit (Moosbrugger and Kelava, 2020).

Convergent validity
To evaluate convergent validity, we correlated the PCQ subscales with the Spielberger State-Trait Anxiety Inventory (STAI Short version Y-6) 1 (Marteau and Bekker, 1992). This scale is designed to measure current emotional status and is assumed to correlate high (Pearson's r greater than 0.5) with all three subscales of the PCQ. We further hypothesize that the highest correlation to be found is with the emotional subscale of the PCQ. If this hypothesis cannot be rejected, we conclude acceptable convergent validity of the German version of the PCQ. Additionally, we assume that the STAI-Y-6 scale correlates stronger with the negative PCQ subscales than with the (recoded) positive PCQ subscales. This assumption is based on the rationale that the absence of a positive effect (e.g., not experiencing greater well-being) should not be considered as equal to the occurrence of a negative effect. This circumstance led Brodersen & Thorsen to rephrasing the positive items, so that they allow changes in both directions (Brodersen andThorsen, 2003, 2008).

Acceptability
We noticed low item-non-response for all PCQ items in our sample. Following the approach explained above, we had to exclude 15 cases for the negative PCQ items and 66 cases for the positive PCQ items since they show less than 80% of overall scale response. This pattern illustrates that the general non-response rate is higher for the positive PCQ items (ranging from 3.3 to 8.8%) than for the negative items (ranging from 2.3 to 3.9%) (see Table 1). We could not identify any non-response pattern that is associated with a specific subscale. Non-unique answers were generally low and were not observed more than three times per item (highest non-unique answer rate of 0.6% in item "feeling more able to do things which I normally did before").

Distribution properties
For all negative PCQ items we found mean values ranging between the categories "0 Not at all" and "1 Rarely" indicating that the impact of screening is generally low. This pattern also explains a substantial floor effect for all negative items with more than 50% of responses lying in the undermost category. In contrast, we found small ceiling effects with the highest percentage of values in the item "I felt worried about my future". As consequence, the data is highly skewed. Z-standardized skewness exceeds the cut-off value of + 1.95 for any negative PCQ item. Regarding kurtosis, items 2, 4, 6, 9, 10, 11, and 12 lie beyond the range of −1.95 to + 1.95 indicating no normal distribution.
The positive PCQ items generally show higher mean values ranging from 0.88 to 2.01 indicating a tendency toward a higher item agreement to positive effects of the screening. This pattern is in line with fewer floor effects for the positive items in comparison to the negative items, so that substantial floor effects were found for item 13, 14, 15, 21, and 22, moderate effects were found for items 18, 19 and 20 and small floor effects were found for items 16 and 17. Substantial ceiling effects were only found for item 16, apart from that, the ceiling effects for the positive PCQ scales can be considered as small to moderate in general. Regarding skewness and kurtosis, the positive items show values closer to zero than the negative items. However, most of them exceed the cut-off values ± 1.96, indicating no normal distribution of the data. One should notice that all positive PCQ items show negative signs for kurtosis.

Internal consistency
For this section and the subsequent analyzes, we applied missing value imputation as stated above. Furthermore, we recoded the positive items to compute the subscale indices so that higher scale values indicate less positive impact. In general, we found acceptable McDonald's Omega higher than 0.7, except for the overall physical subscale, the overall social subscale and the social positive subscale (see Table 2) (Moosbrugger and Kelava, 2020). For the physical subscale, higher McDonald's Omega values can be found if they were treated separately (positive and negative subscale). The social subscale shows major problems, since the coefficient could not be computed due to zero to negative covariances and the social positive subscale has only two items. For the emotional dimension we found a good Omega for the overall scale (0.80) with a better Omega for the negative scale (0.94) and a slightly worse Omega for the positive emotional scale (0.79). This pattern is also reflected in the mean inter-item correlation. In general, the overall subscales show lower values than if treated separately. Regarding the total PCQ scales (positive, negative, total), we found excellent McDonald's Omega values of 0.86 and higher indicating good consistency. For the negative subscales and the total negative PCQ scale, we found strong deviation between the scale mean and the median indicating that the majority of our sample is not affected strongly in general, but bias by few extreme values occurs. Table 3 shows the model fit statistics of our confirmatory factor analyses. First, we calculated strict 3-factor solutions not allowing covariances between the three dimensions. All model fit statistics show bad values for all three models indicating a misspecification of each of the models. Second, we estimated one-factor solutions for each subscale and the overall scale. In general, the fit statistics are better for each model, however, the overall scale model has to be considered as misspecified. For the negative and the positive subscales, we reached a strong decrease of Chi 2 , which still has a significant p-value indicating non-optimal fit. Though TLI and CFI reached the pre-defined cut-off-values for both the negative and the positive model, RMSEA and SRMR did not meet the cut-off values for the positive subscale model. We estimated the unstandardized covariance between the assumed three factors for all three setups: For the negative PCQ dimensions, we found strong covariance between the physical and the emotional dimension (0.68), between the physical and social dimension (0.64) and even stronger between the emotional and social dimension (0.73). For the positive items, the covariances between the latent constructs were smaller between the emotional and physical domain (0.35) and between the emotional and social domain (0.32). The physical and social domain showed high covariance (0.76). For the overall model, we also found moderate to high covariances between the physical and emotional scale (0.55) as well as between the physical and the social scale (0.56) and between the emotional and social scale (0.63). Altogether, these findings support the use of a higher-order CFA allowing covariances between the three factors.

Scale structure
Thus, we modeled higher order CFA in a third step with three assumed latent constructs on level 1 and one overall construct on level 2. The model fit statistics increased slightly both for the positive and the negative subscales, indicating that the higher order CFA solution is fitting better than the onefactor solution. For the overall scale model, the performance remains weak and insufficient.
In a last step, we modified our models based on suggestions by the modification indices (see Supplementary Material). For the negative model, a covariance of the error terms of item 10 and 11 (physical subscale) was recommended. We consider this as plausible, since both items refer to difficulties maintaining daily life activities, while the other two items of As modification indices suggested that model fit would be improved if correlated error terms were included, we added one error term correlations that could improve the model the most (e10 and e11). b As modification indices suggested that model fit would be improved if correlated error terms were included, we added two error term correlations that could improve the model the most (e13 and 14, e16 and 17). c As many modification indices suggested a broad improvement of the model, we decided not to further modify a model that is apparently misspecified.
this construct refer to nutritional and sleeping impairments. Further modification indices suggest cross-loadings of item 12 with the physical and the social scale. However, since we did not find a clear rationale for this, we stopped modification here and ended up with a well performing modified model (see Figure 1). For the positive subscale, we found a higher amount of modification suggestions. The highest reduction in Chi 2 was suggested by implementing a covariance between the error terms of items 13 and 14. Here the phrasing of the items might play a role. Both items were formulated beginning with "I feel" in German. This is also true for item 22, however this refers to an overall expression, while item 13 and 14 are more specific. After including this covariance, the covariance of item 16 and 17 was suggested. Since those two items also refer to daily life activities (as in the negative items 10 and 11), we included this variance as well. After modification, we ended up with a sufficiently performing model (see Figure 2).
For the overall CFA model, we decided not to implement further modifications. The model seems to be strongly misspecified, so that many modifications would be necessary to reach acceptable model fit. However, this would pose the risk of overfitting a model and constructing a statistical artifact that might be far away from reality.
Since we could reach a better model fit for both the negative and the positive models, when implementing a higher order CFA, we consider that as a prime example of the biopsycho-social model (Schotte et al., 2006) which indicates the difficulty of separating the three dimensions physical, social and emotional from each other.

Convergent validity
As exposed in Table 4, we have to reject our hypotheses partially. Only the emotional subscale shows a strong correlation (r = 0.53) with the STAI-Y-6. The physical and social overall subscales show moderate positive correlations with the STAI-Y-6 ranging from 0.40 to 0.42. However, the total PCQ scale fits the hypothesis so that a strong correlation of 0.52 can be found. As we assumed, all positive subscales correlate less strongly with the STAI-Y-6 than the negative subscales do. Since we also found evidence for the assumption that the emotional subscale is stronger related to the STAI-Y-6 than the other subscales are, we overall can assume acceptable convergent validity of the PCQ.

Discussion
Overall, our study results report a successful adaptation of the German PCQ with good performance in terms of acceptability, internal consistency, scale structure and convergent validity. The non-response rates and non-unique answer rates were negligible. Our results were comparable to other adaptations of the PCQ (Brodersen and Thorsen, 2003;Brodersen, 2006;Rijnsburger et al., 2006) and demonstrate that the PCQ is not only useful for the setting of cancer diseases.
However, as in other validation studies, we found substantial floor effects, especially for the negative PCQ items (Rijnsburger et al., 2006). Floor effects pose a potential risk in terms of accuracy of a scale since one has to assume that some kind of Modified higher order CFA model with standardized estimates (negative subscale). Modified higher order CFA model with standardized estimates (positive subscale).  variation is drawn together in the lowest category. In general, we would suggest further differentiating the response categories, but this would also have as consequence that homogeneity of measurement across studies and countries would suffer. Since the lowest category is "not at all" one can assume that the PCQ produces floor effects if the population simply does not experience the eligible dysfunction. Here, it seems that the overall psychosocial impact of the screening is quite low. Another limitation of the PCQ is its content validity. Brodersen and Thorsen (2008) found in a similar study translating and adapting the PCQ into Danish, that the original items do not cover all psychosocial aspects of screening. This is especially the case for negative consequences of abnormal screening. Therefore, an enhanced version of the PCQ containing 33 items was suggested. However, they extended questions covering areas that are exclusively relevant to the context of breast cancer screening. For future research we recommend to follow the approach of Brodersen and Thorsen (2008) using focus groups to check for potential uncovered fields of the PCQ in the context of liver diseases. Using this method could also reveal insights in whether diagnosis of early cirrhosis is equivalent to cancer screening regarding the patient reported outcome measures, which was an assumption we need to made for this study.

Frontiers in Psychology
Reliability analyses suggest not summing up the physical, social and emotional subscales for positive and negative items. Moreover, the separation of the two scales should be applied as intended by Cockburn et al. and confirmed by Ong et al. (1997) into a negative and a positive subscale (Cockburn et al., 1992). This was also well demonstrated in our CFA, since the overall model was extremely misspecified and the separate treatment of positive and negative items reached clearly better fit. Regarding the discussion whether a three-factor or a one-factor solution is more favorable, our structural equation modeling approach suggested a compromise by treating the scales as higher order factors. Our analyses showed that a three-factor solution not allowing covariances between the three factors (which were found to be strong in our study, as well as proven in earlier psychometric tests of the PCQ (Cooper and Aucote, 2009) leads to weak performing CFA. If necessary, the one-factor solution is a better choice, however, we recommend to model the higher order factor structure as we presented in this work.
Because of the cut-off value of 80%, which we chose as a criterion for too incomplete scales for analysis, we had to exclude 29 cases only because of the positive subscales and four cases only because of the negative items. In line with a generally higher non-response rate of the positive items, we assume that here fatigue effects might have occurred. The positive items were placed as a block after the negative items in our questionnaire at the end of the page. Since our study could not apply randomization of items, we recommend further testing of the German version of the PCQ including tests for order effects.
Another limitation of our study design was that we could not contact the patients in a specific period after their screening so that some patients have longer periods between the screening experience and responding to the questionnaire than other ones. An individual contacting procedure (e.g., 4 weeks after each test) would improve the comparability of the measurement, but was not realistic to implement in this study. We recommend further testing the German PCQ in a controlled setting, which could also offer the possibility to get insights into retest reliability and the sensitivity of measuring changes over time. This could shed light on the progress of potential burden after a screening experience.
The general low response rate to our study limits representativeness of our results. After the delivery of the questionnaire, we received some phone calls of patients who were not aware of their inclusion into this study. Cognitive impairment and lingual problems also were named as reasons for non-response to our questionnaires.
Due to the weak performance of the modified CFA model for the total PCQ and because of the shared variance of items 9, 10, 11 and the emotional subscale, a pattern that occurs only in the total PCQ model, we recommend not to use the total PCQ score and instead computation of the negative and the positive PCQ separately.
Since the negative PCQ subscale and the total PCQ correlated at least moderately with the STAI-Y-6, we consider the convergent validity as acceptable. However, more insights into other aspects of validity would help to further evaluate the performance of the PCQ. In other validation studies, the Impact of Event Scale (IES) and the Hospital Anxiety and Depression Scale (HADS) were used for validation. In our project, the IES seemed not to be adequate since most of our participants were screened negatively. Thus, the considered event "liver cirrhosis" is not as present as it is in the context of the breast cancer screening study of Rijnsburger et al. (2006). The HADS was also not considered to be useful in our context since it contains a longer list of items than the short form of the STAI. However, for the future investigation of the PCQ we recommend also considering the HADS or IES to enhance international comparability of the German version of the PCQ.

Conclusion
Overall, our study results report a successful adaptation of the German PCQ with good performance in terms of acceptability, internal consistency, scale structure, and convergent validity. We could demonstrate that the German version of the PCQ is a useful and well-performing measurement for both negative and positive screening consequences, even in a non-cancer setting. However, future studies need to address content validity of the PCQ in the context of liver screening.

Data availability statement
The raw data supporting the conclusions of this article can be requested from the corresponding author.

Ethics statement
The studies involving human participants were reviewed and approved by Ethics Committees of Rhineland-Palatinate and Saarland, Germany. The patients/participants provided their written informed consent to participate in this study.

Author contributions
UF wrote the draft of the manuscript and was responsible for survey design, data collection and analysis. AM and EF-G reviewed the manuscript and revised it critically. EF-G was project leader and mentored the development of the questionnaire used in the project. UF and EF-G translated the questionnaire into German and reconciled their versions. All authors contributed to the article and approved the submitted version.

Funding
The SEAL-program and the sub-study, which provides the data base for this manuscript was fully funded by the Innovation Fund of Federal Joint Committee of Germany, provided by the Deutsches Zentrum für Luft-und Raumfahrt (Funding ID: 01NVF16026). The University Library of the University of Freiburg supports this publication by its Open Access Publication Fund.