This is a repository copy of Comparison of the Factor Structure of the Patient Health Questionnaire for somatic symptoms ( PHQ-15 ) in Germany , the Netherlands , and China : A Transcultural Structural Equation Modeling ( SEM ) Study

eprints@whiterose.ac.uk https://eprints.whiterose.ac.uk/ Reuse This article is distributed under the terms of the Creative Commons Attribution (CC BY) licence. This licence allows you to distribute, remix, tweak, and build upon the work, even commercially, as long as you credit the authors for the original work. More information and the full terms of the licence here: https://creativecommons.org/licenses/

a bifactorial model for categorical data, however, the model can only be recommended for use of the general factor. Application of the orthogonal subscales in non-European samples is not corroborated by the results. The differences cannot be ascribed to differences in health care settings or by differences in concomitant depression or anxiety but instead, a cultural factor involving concepts of disease may play a role in this as they may play a role in the translation of the questionnaire. Further research is needed to explore this, and replication studies are needed regarding the factorial structure of the PHQ-15 in China.
Keywords: somatic symptoms, patient health questionnaire-15, factor structure, structural equation modeling (SEM), transcultural BACKGROUND Somatic Symptoms, Emotional Distress, and Disability In everyday life, somatic symptoms are common causes of outpatient medical visits (1,2). Patients with multiple distressing somatic symptoms present themselves in a variety of health care settings, such as primary, secondary and tertiary patientcenters (3)(4)(5)(6). Poor self-rated health of patients was found to be associated with multiple somatic symptoms in Europe and in China (7)(8)(9)(10). The number of somatic symptoms correlates well with impaired function and medical help-seeking behavior even after controlling for mental disorders (5). The number of somatic symptoms differs only slightly between patients with somatic symptoms explained by a medical disease and patients with unexplained somatic symptoms (11,12). This indicates that the suffering of a patient with medically unexplained somatic symptoms should be taken seriously. In Western countries, a high number of somatic symptoms is associated with higher psychological distress, more functional impairment, higher disability, more health care utilization and a reduced quality of life (13)(14)(15). Given the large burden of somatic symptoms, especially unexplained symptoms that are currently not taken sufficiently seriously, the assessment of somatic symptoms and the analysis of the different facets of somatic symptoms is a useful and necessary part of every medical diagnostic process.

The Patient Health Questionnaire (PHQ)-15
The Patient Health Questionnaire (PHQ)-15 (16) is a short, practical self-rating instrument for the screening of somatic symptoms. The PHQ-15 has also been suggested by the DSM-5 Workgroup on Somatic Symptom Disorders (SSD) as a measurement tool of somatic symptom severity for the classification of SSD (17,18). The PHQ-15 was initially validated in primary care and a general hospital setting in the USA (16). Although usually the sum score is simply used as a measure for symptom load, several studies (16,(19)(20)(21) suggest that the different somatic symptoms in the PHQ-15 can be divided and bundled into four groups: cardiopulmonary, gastrointestinal, pain, and fatigue/general symptoms.
Witthöft et al. (22) assessed the underlying structure of the PHQ-15 in two datasets of college students (Germany, N = 1,520; Switzerland, N = 3,053). They conducted a confirmatory factor analysis assuming a bifactor model (1 general and 4 orthogonal specific symptom factors; gastrointestinal, fatigue, cardiopulmonary, and pain symptoms). Correlations with general and lower-order latent factors of depression, general somatic symptom distress, and health anxiety were found. These factors explain up to nearly 70% of the variance in the general somatic symptom factor. However, it remains to be shown if this structure also applies to other samples. The original validation of the PHQ-15 was in general hospital patients and primary care patients (16), but not in patients in specialty mental health institutions, so it would be useful to explore the factor structure in other health care settings such as the specialty mental health care setting.
Also, it is unclear whether the factor structure of the PHQ-15 questionnaire is comparable between samples from Europe and samples from other continents and cultures such as Asia, in particular China, since the cultural background shapes the interpretation of somatic symptoms and thereby influences an individual's illness perception and illness behavior (23,24). In the literature, the influence of culture on experiencing somatic as well as psychosomatic symptoms has been explored. Evidence exists that cultural and personal explanatory models can contribute to the symptomatology of medically unexplained symptoms (25). Efforts have been made to make epidemiological research into this domain possible (26) and the need for appropriate instruments to assess physical symptoms in the context of distress in Asian cultures has been expressed (27).
Historically, there has been a popular belief that Asians manifest a lower prevalence of mood and anxiety disorders than their Western counterparts because they are more prone to experiencing and manifesting distress via somatic pathways (28)(29)(30). Among Chinese patients receiving psychiatric services, somatic symptoms such as pain, insomnia, and fatigue have been associated with depressive and anxiety disorders (31).
The PHQ-15 has been translated into many other languages and has been examined in samples from many other countries, e.g., Saudi Arabia, Spain, and Korea (32)(33)(34). This offers the potential for comparisons between different ethnic, cultural or geographical groups. However, so far, no studies have explored the factor structure in the context of different health care settings and West European vs. Chinese culture.
Another point of discussion concerns somatization. Somatization is defined as the tendency to experience and to express physical distress and symptoms that cannot be explained by pathological findings, to attribute them to a medical condition, and to seek medical care for them (35). It has also been described as a term related to our conceptualisation of disease, that is, to the tendency of doctors to attribute symptoms either to physically explainable medical conditions, or as medically unexplained symptoms (36). It has been suggested that physical symptoms occurring in the context of somatization may also concur with depression or anxiety (37). Hence, somatization, as assessed by PHQ-15, might be related to anxiety or depression as assessed by GAD-7 and PHQ-9, as somatization, anxiety, and depression are supposed to be related. It was hypothesized that this might be the case in China, as somatization has been assumed to be more common in China as a manifestation of underlying anxiety and depression (38). Hence, the association between scores on the PHQ-15 and GAD-7 and PHQ-9 should be explored in these different cultures.

Rationale and Aims of the Study
In this study, we compare patients with SSD including all kinds of psychosomatic illnesses in a convenience sample from three different countries, namely Germany, the Netherlands, and China, that enabled us to compare three different health care settings, namely the general hospital inpatient psychosomatic setting (Germany), the specialty mental health outpatient setting providing treatment for SSRD patients with high complexity levels in terms of diagnostic problems, treatment issues and social challenges (8), in the Netherlands; and the general hospital (China) for SSD. Finally, this way we could compare two different cultures, namely the Western European and the Chinese culture.
The objective of the present study was to assess and to clarify the factor structure of the PHQ-15 within and between different countries.
We aimed to answer the following research questions: (1) Can the proposed bifactor model of Witthöft et al. (22) be replicated in three different study populations? (2) If not-are there more appropriate models within the three study populations? (3) Are there indications for cultural differences? (4) Is there an association between PHQ-9 and GAD-7 scores and the Witthöft factorial structure of the PHQ-15? More specifically, is there an association between PHQ-9 and GAD-7 scores and the potential general factor and/or the specific factors of the PHQ-15?

Study Design
This is a cross-sectional, secondary analysis of data collected within several research projects or routine clinical care in the three countries. In Germany, the data were collected as part of routine clinical care. In China, the data were collected as part of a cross sectional study (39) in the general hospital setting. In the Netherlands, the data were collected as part of routine clinical care in the Clinical Centre of Excellence of Body Mind and Health (8,(40)(41)(42).

PHQ-15
The PHQ-15 is a self-administered somatic symptoms subscale derived from the full PHQ (14,43). The PHQ-15 was originally validated by Spitzer et al. (16). It includes 15 prevalent somatic symptoms that represent over 90% of the symptoms observed in primary care (16). The patients are asked to rate the severity of their symptoms during the previous 4 weeks on a 3-point scale as either 0 ("not bothered at all"), 1 ("bothered a little") or 2 ("bothered a lot"). The questionnaire demonstrated good internal consistency [Cronbach's alpha = 0.80; (16)] and high relevance to symptoms. It is available in multiple languages.

Germany
The German translation of the PHQ-15 has been shown to have sound psychometric properties (44) and reference data from the general population is available (45). The PHQ-15, as well as the PHQ-9 and the GAD-7, were collected within the routine psychometric assessment of outpatients in the Department of Psychosomatic Medicine. Data analyzed here has been collected between January 2005 and February 2009. If patients were assessed multiple times, we included the first occasion only. Clinical diagnoses were made by trained physicians and psychologists. Along with questionnaires and clinical diagnoses, we collected sociodemographic data.

China
The validity and reliability of the Chinese version of the PHQ-15 (46) were examined in the general population of Hong Kong. The Hong Kong version of the PHQ-15 exhibited satisfactory internal consistency (Cronbach's alpha = 0.79) and stable 1-month test-retest reliability. Somatic symptom severity was positively associated with functional impairment and health service use. In Mainland China, the validity and reliability of the PHQ-15 were tested in the outpatient clinics of general hospitals in Shanghai (47). Cronbach's alpha was 0.73, and the testretest reliability coefficient was 0.75 (10). There were moderate positive correlations between the PHQ-15 score and anxiety and depression values.

Translation of the PHQ-15
Three native Chinese speakers who resided in Germany updated the Chinese version of the PHQ-15 taking findings from recent research into account. The translation was based on the former Chinese version, which was translated from English to Mandarin (47). The differences to this former version include changes in item 1 and item 8: We changed "stomach pain" (胃痛或肚痛) in item 1 to "stomach and abdominal pain" (胃痛), in accordance to the suggestions of Lee, Ma (46). In item 8, "fainting spells" was replaced with "brief fainting" [短 倒 (Cantonese), 短时间晕倒 The data analyzed here were taken from a cross-sectional survey under routine clinical conditions. Participants in this study were inpatients recruited from different departments (e.g., oncology, cardiology, respiratory medicine, rehabilitation, geriatrics and gerontology, general practice, pain management, thyroid and breast surgery, rheumatology, and hepatic surgery). The validation of the PHQ-15 is a component of a larger project investigating the prevalence and recognition of inpatients with emotional distress and their treatment needs at a general hospital.

Ethics Statement
All patients were informed at intake that the Patient Related Outcome Monitoring (PROM) data pertaining to their treatment could be used on an anonymous basis for research, and were provided the opportunity to object to that during intake or anytime during treatment. PROM data of patients objecting to this were not used in this study. The study was approved by the ethics committees of the Shanghai Dong Fang hospital (XZ), the University Medical Centre Freiburg (KF), and GGz Breburg (CFC).

The Netherlands
In the Netherlands, the PHQ-15 was applied in several research projects in primary care (48,49), occupational health care (50), and the general hospital setting (51). In primary care, the PHQ-15 showed limited results in terms of validity (52) as it was found to have a tendency to be falsely positive for somatoform disorder, as it does not discern unexplained physical symptoms from explained physical symptoms. In occupational health care, validity was also limited both in terms of sensitivity as well as in terms of specificity (53). It is currently regularly applied in the intake procedure of the Clinical Centre of Excellence for Body, Mind, and Health (Dutch abbreviation: CLGG), which is a specialty mental health institution for patients with somatic symptom and related disorders. Data from this patient group is included in this comparative study. Before intake, patients at CLGG receive questionnaires using Routine Outcome Monitoring. Data for this study were collected during 2010 and 2016. A total of 465 patients filled in the PHQ-15, GAD-7, and PHQ-9.

Depression Scale PHQ-9
This instrument assesses each of the nine DSM-IV depression criteria on a scale of "0" (not at all) to "3" (nearly every day) (13). The PHQ-9 demonstrated acceptable psychometric properties for the screening of patients with late-life depression. In the Netherlands, the PHQ-9 was validated in the Depression Initiative, a national study aimed at improving mental health care for people with depressive symptoms. It proved to be a feasible and valid instrument to screen for depressive disorder, to assess severity and to assess change in scores over time in the primary care, occupational health care and general hospital outpatient setting. It has also been proven to be feasible and valid for use in electronic monitoring systems in the context of blended mental health care in the occupational health setting. The PHQ-9 is a widely used and valid instrument in the Netherlands as well as in Germany (54)(55)(56).
In Chinese, within a primary care setting, this questionnaire showed a sensitivity of 0.86 and a specificity of 0.77 for depression screening (57).

Anxiety Scale (GAD-7)
A seven-item anxiety scale (GAD-7) was used to assess the severity of generalized anxiety (58). In a German validation study (59), it has been shown to be a reliable and valid measure of anxiety in the general population. In a study in the Dutch primary Care setting, it showed limited validity as a screener and better validity as a case-finding instrument for general practitioners (60) and it proved to be feasible in the application of collaborative care in the primary care setting. In a Chinese general hospital population, this instrument showed good reliability and good criterion, construct, factorial, and procedural validity (23).

Statistical Analyses
Using IBM SPSS (24.0) (61), and MPlus (62) 7.4 software, all three samples were analyzed using confirmatory factor analyses (CFA) and structural equation modeling (SEM). As the items of all questionnaires were categorical ordered the analyses were conducted with the robust weighted least squares estimator with a mean and variance adjusted test statistic (WLSMV). Because the chi-square test is sensitive to potentially irrelevant deviations from the implied model structure given large samples, we used established fit measures to evaluate the models. For the absolute fit, we chose the root mean square error of approximation (RMSEA). The comparative fit index (CFI) and the TLI (Tucker-Lewis Index) are reported as incremental fit indices. Following the advice of Hu and Bentler (63) RMSEA values lower than 0.06 and CFI/TLI values higher than 0.95 are considered as indicators of a good model fit. All coefficients (i.e., factor loadings and path weights) are reported as standardized coefficients.
Several CFA models were tested. In a first step, the bifactor model from Witthöft,Fischer (22) was estimated within a multigroup design. Because model fit was only acceptable, alternative models were sought in a second step. In this second step, the bifactor model [see

Description of the Sample
Age and gender in the three samples are shown in Table 1.
The age and gender are comparable in the dataset from Germany and the Netherlands but the participants from China are significantly older, with a range of up to over 90 years, and have significantly more male participants. This may have to do with the fact that for both Germany and the Netherlands, but not in China, separate health services exist for patients older than 70 years.
The distributions of the PHQ-15 items within the three samples (Germany N = 2,517, China N = 1,329, Netherlands N = 456) are presented in Table 2.
Some items (e.g., PHQ8 [fainting spells] and PHQ11 [pain or problems during sexual intercourse]) have a bottom or floor effects so a metric scale cannot be assumed. Hence, a second model was explored assuming that the factors might not be completely independent, allowing for correlations between the four symptom-specific factors to be unequal to zero, in which the four symptom-specific factors would not be orthogonal anymore. This second model was explored over the three groups taken together again.
The Chi-Square-Test of the model fit was 641.186 (df = 46, p < 0.0001). The RMSEA was about 0.055 (90%C.I. = 0.051, 0.059). With a CFI of 0.980, a TLI of 0.965, this model had a better fit but the MPlus results showed that the latent variable covariance matrix was not positive definite for the Chinese sample, not for the whole model. Hence, a satisfactory fit for the three samples taken together could not be established.

CFA Model for the Separate Datasets
We split the dataset into the three datasets or Germany, China, and the Netherlands, and searched some specific model per country and special sample within these countries.

Germany
Starting with the German dataset, we tried a model comparable to Witthöft et al.

The Netherlands
For the dataset from the Netherlands, explorative testing and using the modification indices showed that only a little change for item 6 (chest pain) had to be done (see Tables 3a,b). In the original model, this item belongs to painrelated symptoms. For our model, this item fits better to the cardiopulmonary-related symptoms factor. The Chi-Square-Test of the fit for the final Dutch model was 53.624 (df = 46, p = 0.2051). The RMSEA was about 0.019 (90%C.I. = 0.000, 0.038; probability of RMSEA ≤ 0.05 is = 0.999). With a CFI of 0.998, a TLI of 0.997. The model is shown in Figure 2.

China
The structure from Witthöft et al. (22) could not be replicated in the dataset from China. We split the cardiopulmonaryrelated symptoms factor into two factors. These factors were correlated but estimate different latent variables. In the model for this dataset the pain-related symptoms factor includes items PHQ2 (back pain), PHQ3 (pain arm legs), and PHQ6 (chest pain), and the gastrointestinal-related symptoms factor includes items PHQ1 (stomach pain), PHQ5 (headaches), PHQ12 (constipation), and PHQ13 (nausea, gas). The first cardiopulmonary-related symptoms factor only includes items PHQ7 (dizziness) and PHQ10 (short breath), while the second cardiopulmonary-related symptoms factor is measured by PHQ6 (chest pain), PHQ8 (fainting spells), and PHQ11 (pain sexual intercourse) (see Tables 4a,b). The fatigue-related symptoms factor includes items PHQ14 and PHQ15. The Chi-Square-Test of the fit for the final Chinese model was 187.779 (df = 41, p < 0.0001). The RMSEA was about 0.052 (90%C.I. = 0.045, 0.060; probability of RMSEA ≤ 0.05 is = 0.325). A CFI of 0.977, a TLI of 0.957, show a good fit of the model, as shown in Figure 3.
A part of the results from the different models-the thresholds between the points on the scale of the PHQ-15 items-can be interpreted and compared between the different countries. The results in Table 5 reflect the descriptive in Table 1 where the   Table 5.

PHQ-9 and GAD-7
We then tested the predictive value of the PHQ-9 and the GAD-7 questionnaires within these models. Within the German and the Chinese sample the PHQ-9 had good prediction values for most PHQ-15 factors while the GAD-7 predicted the general factor and the fatigue factor well. In the sample from the Netherlands, only the general factor is wellpredicted by the PHQ-9 and the GAD-7. This is shown in Table 6.

DISCUSSION
This study examines the factor structure of the PHQ-15 in a cross country and cross cultural setting involving China, with a total of 4302 participants. It shows that the factor structure found by Witthöft et al. (22) consisting of a general symptom distress factor and four orthogonal symptom-specific factors (pain-, gastrointestinal-, cardiopulmonary-, and fatigue-related symptoms) does not fit the full sample consisting of all three study populations. An analysis per sample showed similarities with the Witthöft model in the German and the Dutch population. In the German sample the model of Witthöft could be replicated if low but essential correlations between the factors were allowed. We find a comparable but different model for the dataset from the Netherlands. The Dutch sample showed an excellent fit with the original model if item 6 was adapted belonging to pain-related symptoms toward fitting this item to the cardiopulmonaryrelated symptoms factor.
However, the Witthöft model could not be replicated in the Chinese sample. There were differences in pain, in the sense that the cardiovascular item chest pain was mostly reported in the context of general pain symptoms. Also, items like fainting, chest pain and problems with sexual intercourse were related, contrary to the European samples. Hence the Chinese sample seems to have different properties compared to the European samples. This may be due to epidemiological differences, such as that cardiovascular symptoms were reported and experienced in the context of sexual symptoms. But it might also be associated with cultural differences in body and illness perception, or in scripting, as described by Kleinman (64); a script concerning a specific clustering of symptoms may be shared by patients and Chinese    (67,68) and thus lead to different scripts for distress (68). Thus, cultural scripts (66) shape the ways in which people attend and react to particular experiences marked as important in some way. In some cases, pathological loops can form, where attention to a particular symptom can accentuate its severity and give rise to related symptoms' (65).
Another explanation may be fear of stigma; if sexual intercourse would be experienced as problematic, it might be reported in the context of symptoms suggesting cardiovascular disease, and not as standalone. A connection between shame and stigma has been reported for schizophrenia (69). The role of stigma in recognition of mental disorders in China has been explored (70). The literature indicates that higher rates of shyness and other normal interpersonal concerns in Chinese cultural contexts exist (71). Such shame may also play a role in sexual problems if it occurs in the context of unrecognized mental disorder, and thus in the reporting of physical symptoms in SSD.
Regarding a possible association between factor structure and health care settings, this was considered as a possibility because somatic symptoms may be perceived as providing more effective access to health care resources (38), however, this does not seem to be corroborated by the results. The German and Dutch samples seem to be congruent although from different health care settings, namely a general hospital setting and a specialized mental health care setting for SSD. The Chinese sample also concerns a general hospital setting, but the Chinese sample shows a different factorial structure from the European samples. This different factorial structure in the Chinese sample does not seem to be explained by different associations with depression and anxiety either, as both the German and the Chinese sample showed that the PHQ-9 had good predictive values for most PHQ-15 factors while the GAD-7 well-predicted the general factor and the fatigue factor. In contrast, the sample from the Netherlands found the PHQ-9 and the GAD-7 to be only predictive for the PHQ-15 general factor.
The findings of this study suggest that the different factorial structure of the PHQ-15 in China is not to be explained by differences in concomitant depression or anxiety. Cultural factors such as stigma, scripting, or different concepts of disease (36), may play a role in this. Further research is needed to explore this, and replication studies are needed regarding the factorial structure of the PHQ-15 in China. Nevertheless, with this knowledge the PHQ15 can be applied in China, taking this small difference in somatic symptom clusters for China into account. Future studies should also explore to what extend the different factorial structures might bias data analysis when the PHQ-15 is used as a unidimensional measure. Furthermore, they should investigate whether any confounding factors could be identified which drive the differences between cultures.

Limitations of the study
It can be considered a limitation of the study that health care settings and culture differences overlap and hence their assessment cannot be separated sufficiently to enable firm conclusions in this matter.

CONCLUSION
The PHQ-15 is a valid questionnaire that can discern somatization from anxiety and depression within different cultures like Europe or China. The questionnaire can be fitted to a bifactorial model, however, we can only recommend the use of the general factor. Application of the orthogonal subscales in non-European samples is not corroborated by the results. Further research is needed to explore explanations and to replicate the findings of this study.

AUTHOR CONTRIBUTIONS
Acquisition of the study was by KF. CvdF-C, KF, RL, and LdV designed the study. CvdF-C, LdV, LZ, YL, ZD, RS, SN, FF provided the data. RL, FF, CvdF-C, LdV, and KF designed the analyses and RL performed them. CvdF-C, LdV, and RS contributed to the interpretation of the data. RL, LdV, and CvdF-C wrote the manuscript and LZ, YL, ZD, RS, SN, and FF contributed to the manuscript for the Chinese and German samples. All authors critically revised and approved the final version of the article.

FUNDING
Data collection in China was supported by the Grant GZ 1155 of the Sino-German Research Centre. Data analyses and writing of the manuscript were supported by Grant GZ 690 awarded by the Centre for Sino-German Research Promotion in Beijing to Kurt Fritzsche. Furthermore, this study was supported by GGz Breburg and Tilburg University, Tilburg, the Netherlands. The article processing charge was funded by the German Research Foundation (DFG) and the University of Freiburg in the funding programme Open Access Publishing.