Can We Apply WHOQOL-AGE to Asian Population? Verifying Its Factor Structure and Psychometric Properties in a Convenience Sample From Taiwan

Objectives: To translate and validate a recently developed quality of life instrument (WHOQOL-AGE) on geriatric population into Chinese. Method: Using cross-sectional observational design, the WHOQOL-AGE was conducted among older people through interview. Confirmatory factor analysis (CFA) was used to examine the factor structure and multigroup CFA used to examine the measurement invariance. Results: Through convenience sampling, 522 older adults (mean age = 73.42) participated in the study. Among them, 194 were males, 213 had an educational level at primary school or below, 398 were residing in the community, and 307 were aged 70 years or above. A bifactor structure (items Q1–Q8 are embedded in the factor 1; items Q9–Q13 embedded in the factor 2; and all the items embedded in an additional construct of QoL) was confirmed by the CFA in both the entire sample (χ2 = 25.4; df = 51; p = 0.999) and the subgroup sample with age 70 years or above (χ2 = 25.28; df = 51; p = 1.000). Multigroup CFAs results supported the measurement invariance for the WHOQOL-AGE across genders, having different educational levels, living in different settings and age groups. It also shows good known-groups validity. Conclusions: The promising psychometric properties of the WHOQOL-AGE were found in our convenience sample of older Taiwanese. The supported measurement invariance indicates that the older people in different conditions of gender, educational level, and living setting interpret the WHOQOL-AGE similarly. However, our results should be interpreted with cautious because of the sample representativeness.


INTRODUCTION
Due to modern medicine and improved technology, life expectancy has been extended worldwide (1). How to maintain quality of life (QoL), especially those aged over 65 years, has become a global concern (2). The World Health Organization (WHO) proposed the concept of QoL and developed corresponding measures in the 1990s (3). Specifically, QoL instruments are based on patient-reported outcomes (PROs), a primary outcome proposed by the U.S. Food and Drug Administration (4) to help healthcare professionals make clinical decisions using clients' feelings (5). With such unique characteristics, the number of QoL instruments has been growing in the literature, including those used in the elderly. Indeed, several generic QoL instruments have been verified as useful for assessing QoL among the older population (2,6,7).
Although generic QoL instruments have the advantage of comparing people with different conditions (8), some criticisms have been made. Taking the elderly population as an example, generic QoL instruments may not capture the most important aspects of QoL for the elderly (e.g., older people usually have worse hearing and vision than other adults, and this aspect is not included in the generic QoL instruments). Therefore, some researchers have developed more appropriate QoL instruments for the older population. For example, the Elderly Quality of Life Index (EQOLI) was developed in Brazil for longitudinal assessments of QoL change and potential impacts from behavior, intervention, and treatment (9,10). The Quality of Life Scale for Elderly (QOLS-E) was developed in Japan using a sample of institutionalized people (11). The WHOQOL-OLD was developed as a supplementary module of the WHOQOL-BREF (12,13).
However, a time concern has been raised for both the EQOLI and WHOQOL-OLD. Specifically, the EQOLI has 43 items, and the WHOQOL-OLD should be used together with the 26item WHOQOL-BREF. Therefore, completing the EQOLI or WHOQOL-OLD plus WHOQOL-BREF may be a substantial burden for older people (14). Moreover, the QOLS-E was found to have somewhat low internal consistency (11). As a result, there is a need to develop a QoL instrument for older people with satisfactory psychometric properties.
In order to overcome the aforementioned problems, Caballero et al. (15) developed a stand-alone QoL instrument that assesses older people's QoL: WHOQOL-AGE. The 13-item WHOQOL-AGE was designed according to the characteristics of aging populations and was developed by integrating previous WHOQOL instruments, including the EUROHIS-QOL and the WHOQOL-OLD short form (12). The WHOQOL-AGE shows promising psychometric properties in nationally representative samples from Finland, Poland, and Spain (15). Moreover, older people from these three European countries somewhat interpreted the WHOQOL-AGE similarly (16). With good psychometric properties and a short administration time, the WHOQOL-AGE seems to be a feasible and appropriate instrument for use in large-scale population studies and busy clinical settings. Unfortunately, current literature shows that only two European studies (15,16) have investigated the psychometric properties of the WHOQOL-AGE. Apart from the languages spoken in Finland, Poland, and Spain, the WHOQOL-AGE is not available in any other languages. Given that comparing QoL between people living in different countries requires the use of the same instrument, translating the WHOQOL-AGE into other languages is needed. Moreover, psychometric properties highly rely on the tested population; therefore, a sound instrument should be tested using different psychometric methods across different populations (8). If the requirements can be fulfilled, the psychometric evidence of the WHOQOL-AGE can be accumulated.
In addition to the lack of versions in other languages, the factor structure of the WHOQOL-AGE has not yet been confirmed, and the measurement invariance of the WHOQOL-AGE has never been examined across genders, educational levels, settings, and age groups. Although Santos et al. (16) applied confirmatory factor analysis (CFA) to examine whether the WHOQOL-AGE fits a two-factor model or a bifactor model better (for detailed information of the bifactor model, see Statistical Analysis section below), a model with crossloading on item Q1 (How would you rate your quality of life?) found by Caballero et al. (15) was not compared. Santos et al. (16) found that the bifactor model fits better than the two-factor model and further demonstrated the partial invariance of the WHOQOL-AGE across countries in the bifactor model. However, it is unclear whether the WHOQOL-AGE has invariant factor structures across other important grouping factors. Specifically, gender, education, living context, and age usually impact an individual's cognition and, thus, results in different interpretations for the same item. Interpreting the items in a different way between groups may result in measurement bias between groups. Use education with the Q1 item (How would you rate your quality of life?) as an example. Those graduated from elementary school may think about basic needs (e.g., eating and living) when rating this item; those graduated from university may think about self-actualization (e.g., being respected). However, without the measurement invariance testing, we do not have the evidence to indicate whether the WHOQOL-AGE has the problem of measurement invariance in certain factors. Therefore, it is important to examine the measurement invariance of the WHOQOL-AGE across genders (male vs. female), educational levels (primary school or below vs. junior high school or above), living contexts or settings (community vs. institution, such as a nursing home), and age groups (below 70 vs. 70 years or above) after confirming its factor structure.
In order to fill the gap in the literature regarding the assessment of QoL among the elderly, the present study had the following aims. First, we aimed to translate the WHOQOL-AGE for an East Asian sample (i.e., Taiwanese). Second, to verify the factor structure of the WHOQOL-AGE among the Taiwanese elderly. After ensuring the factor structure of the WHOQOL-AGE, measurement invariance was examined to understand whether elderly people with different genders, educational levels, living settings, and ages interpret the WHOQOL-AGE differently.

Participants and Procedure
The main survey was conducted between October 2016 and March 2017. Participants were elderly persons aged 50 years and over who consented to be interviewed. Those who could not communicate (e.g., those with poor cognitive function or severe hearing impairment or who could not understand spoken Mandarin or Taiwanese) were excluded. Convenience sampling, mainly from the southern region of Taiwan (79%), was conducted to collect data from two groups: relatively healthy participants living in communities; patients or residents living in hospitals or long-term care (LTC) facilities with mild-to-severe dependency. All the participants were interviewed by experienced interviewers who fully understood the study aims. Interviews were conducted in the participants' homes, a quiet space nearby (for those living in the community), or the institution (for those living in an institution). The average time of each interview was about 10 min. Before collecting data, written informed consent was obtained from each participant. The study (A-ER-104-384) was approved by the Human Research Ethics Committee of National Cheng Kung University Hospital, and there were no conflicts of interest between the authors and the goals of this study. In total, 522 valid questionnaires were collected, including 398 (76.2%) from the communities and 124 (23.8%) from the institutions.

Instruments: WHOQOL-AGE and Barthel Index
The WHOQOL-AGE contains 13 items and was developed from several European countries with nationally representative samples (15,16). All the items are rated using a five-point Likert scale, the same scale that is used in the WHOQOL-BREF (17). Moreover, for the WHOQOL-AGE, items Q9-Q13, responses are classified as unipolar (e.g., not at all to completely) and for items Q1-Q8, responses are classified as bipolar (e.g., very bad to very good). Different factor structures have been proposed for the WHOQOL-AGE (for detailed information, please see Statistical Analysis section below), and the scoring method was proposed by Caballero et al. (15). In brief, a higher score indicates a better level of QoL. The psychometric properties of the WHOQOL-AGE have been verified in European countries: Cronbach's α = 0.84 to 0.91 (15); partial invariance across three European countries (Finland, Poland, and Spain) (16).

Translation Procedure of the WHOQOL-AGE
Given that some WHOQOL-AGE items are identical to items in the WHOQOL-BREF and the Taiwan version of the WHOQOL-BREF has strong psychometric properties (17), those identical items were directly retrieved from the WHOQOL-BREF Taiwan version without translation. For the other WHOQOL-AGE items, they were first translated from English into Chinese by a bilingual translator and then back-translated into English by another bilingual translator. A bilingual expert in gerontology fine-tuned the Chinese WHOQOL-AGE (an interim version) after reviewing the forward translation, back translation, and the original English version of the WHOQOL-AGE. The interim version was then discussed and reviewed. The facevalidity of the Chinese WHOQOL-AGE was confirmed by the experts committee.

Statistical Analysis
Participants' characteristics were first analyzed using descriptive statistics. Independent t-tests and χ 2 tests were used to compare the characteristics between participants living in the community and those living in an institution.
Four structural models were further tested using CFA. The four models included a one-factor model (Model 1), two twofactor models (Models 2 and 3), and a bifactor model (Model 4). Specifically, Model 1 had all items loaded on the same construct of QoL. Model 2, proposed by Santos et al. (16), had items Q1-Q8 embedded in Factor 1 and items Q9-Q13 embedded in Factor 2. Model 3, suggested by Caballero et al. (15), was a twofactor model with cross-loading on item Q1: items Q1-Q8 were embedded in Factor 1; items Q1 and Q9-Q13 were embedded in Factor 2. Model 4 was a bifactor model proposed by Santos et al. (16), in which items Q1-Q8 were embedded in Factor 1; items Q9-Q13 embedded in Factor 2, and all items embedded in an additional construct of QoL (Model 4).
The four models were examined using several fit indices to indicate whether the data fit these models. The fit indices included a χ 2 test (in which a nonsignificant finding indicates fit), a comparative fit index (CFI; in which a value higher than 0.9 indicates fit), a Tucker-Lewis index (TLI; in which a value higher than 0.9 indicates fit), a standardized root mean square residual (SRMR; in which a value <0.08 indicates fit), and a root mean square error of approximation (RMSEA; in which a value <0.08 indicates fit) (18)(19)(20). Apart from the fit indices, the four models were compared using the χ 2 difference test. Specifically, if a model had a significantly lower χ 2 than another model, the former model had better fit (21). If some models had similar fits (i.e., no significant difference in the χ 2 difference test), the simplest structure was viewed as the best model given the parsimony principle.
The best model determined using the χ 2 difference test and the structure complexity was further used to examine measurement invariance across different conditions, including gender (male vs. female), educational level (≤elementary vs. ≥junior high), setting (community vs. institution), and age group (<70 vs. ≥70 years). Four sets of multigroup CFAs with nested models were applied to determine whether measurement invariance was supported across gender, educational level, setting, and age group. For each set of multigroup CFAs, there were three nested models, including a configural model, a model that constrained all the loadings to be equal between subgroups, and a model that constrained all the loadings and item intercepts to be equal between subgroups (22,23). The three nested models were then compared using the χ 2 difference test, the CFI, and the RMSEA. A nonsignificant χ 2 indicated invariance across subgroups; however, the χ 2 test is not recommended for use when sample size is large (i.e., n > 200) (18). Alternatively, CFI > −0.01 and RMSEA <0.01 also indicated invariance across subgroups (24,25). Because participants aged 70 years or above may have different perceptions on QoL from those aged below 70 years, we consider testing the WHOQOL-AGE only on those aged 70 years or above as a sensitivity analysis. The same sets of CFAs and multigroup CFAs were analyzed for the subgroup with age equal to or older than 70 years. However, multigroup CFAs on setting and age were not performed in the subgroup with age equal to or older than 70 years.
All the CFAs, including multigroup CFAs, were estimated using the diagonally weighted least square (DWLS) to tackle the Likert type responses in the WHOQOL-AGE (26). Cronbach's α and McDonald's ω were then applied to the confirmed structure of the WHOQOL-AGE to understand its internal consistency. A Cronbach's α > 0.7 and a McDonald's ω > 0.7 were considered as acceptable (26,27). Moreover, known-group validity was tested to understand whether the WHOQOL-AGE could effectively distinguish the different levels of QoL between older people living in the community and those living in an institution. An independent t-test with effect size calculation (i.e., Cohen's d, where 0.2, 0.5, and 0.8 indicated small, moderate, and large effect size) was used for the known-group validity. According to our prior experience, interaction with the geriatric population, and the literature (28), we hypothesized that older people living in the community would have better QoL compared to those living in an institution. All the analyses were done using the R software (R-3.5.1 for Windows). Additionally, CFAs and multigroup CFAs were performed using the lavaan package (http://lavaan.ugent.be/); Cronbach's α and McDonald's ω were calculated using the psych package (https://cran.r-project.org/web/packages/psych/ index.html).

RESULTS
The mean age of the sample was 73.42 (SD = 10.46) years. Among the 522 participants, 194 (37.2%) were males, 213 (40.8%) had an educational level of primary school or below, and 398 (76.2%) were residing in a community. Additional demographic information is presented in Table 1. Moreover, Table 1 presents the characteristics of our participants who were aged 70 years or above (N = 307).
All the proposed models (in both the entire and the subgroup samples) had satisfactory fit indices (CFI = 0.994-1.000, TLI = 0.993-1.009, SRMR = 0.029-0.060, and RMSEA = 0.000-0.029). Moreover, all the models had nonsignificant χ 2 (p = 0.050-1.00), except for Model 1 in the entire sample (p = 0.016). When we used the χ 2 difference test to compare the four models, Model 4 significantly outperformed Models 1 (p < 0.001), 2 (p < 0.001), and 3 (p < 0.001) in both the entire and the subgroup samples. Indeed, the fit indices of Model 4 were the best among all the proposed models ( Table 2).
Measurement invariance was examined using the best-fitting model among the four proposed models (i.e., Model 4). As shown in Table 3 (entire sample results) and Table 4 (results from those aged 70 years or above), the χ 2 difference test  showed no significant differences across gender between the configural model and the constrained loadings model (p = 0.42 and 0.57) and between the constrained loadings model and the constrained loadings and intercepts model (p = 0.68 and 0.92). Although significant differences were found between the configural model and the constrained loadings model across educational levels (p = 0.008 and 0.009), both CFI (0.000) and RMSEA (0.000) supported invariant loadings across educational levels and settings. Given that the χ 2 test is sensitive to and easily significant in a large sample size (i.e., n>200), the CFI and RMSEA are the major indices to decide whether the measurement invariance is supported. The χ 2 difference test showed no significant differences across educational levels between the constrained loadings model and the constrained loadings and intercepts model (p = 0.26 and 0.47). Furthermore, in the entire sample, no significant differences were found across settings and age groups between the configural model and the constrained loadings model (p = 0.07 and 0.23) and between the constrained loadings model and the constrained loadings and intercepts model (p = 0.80 and 0.99).
Because the measurement invariance of the WHQOL-AGE was supported, we used the confirmed structure (i.e., Model 4) to test the internal consistency of the WHOQOL-AGE. Specifically, the entire WHOQOL-AGE and the two factors in the WHOQOL-AGE were examined using both Cronbach's α and McDonald's ω. Cronbach's α was 0.90 for the entire WHOQOL-AGE, 0.84 for Factor 1, and 0.81 for Factor 2. McDonald's ω was 0.91 for the entire WHOQOL-AGE, 0.88 for Factor 1, and 0.86 for Factor 2. The known-group validity further showed that older people living in the community had a better WHOQOL-AGE total score (Mean ± SD = 3.52 ± 0.57) than those living in an institution (Mean ± SD = 3.16 ± 0.63; t = 5.68; p < 0.001) with a moderate effect size (Cohen's d = 0.60).

DISCUSSION
With the rapid growth of the aging population (29), using validated instruments to assess the elderly's QoL is deemed to be important. From the perspective of healthy aging, reducing the possibilities of disease for older people can ease the caregiving burden for both society and family (29). A validated QoL instrument for older people can efficiently screen the health condition and provide timely and early intervention to prevent serious illnesses. Our study, thus, provides psychometric evidence of a brief and efficient QoL instrument specifically for use in the older population (i.e., WHOQOL-AGE) to echo the aforementioned needs. Moreover, our results show that the WHOQOL-AGE has promising construct validity (a bifactor structure) as verified by the satisfactory fit indices in CFA, and the WHOQOL-AGE had invariant factor structures across genders, educational levels, living settings, and ages.
To the best of our knowledge, only two studies (15,16) have evaluated the psychometric properties of the WHOQOL-AGE prior to this study. Our results are comparable to those of the other two studies (15,16). Specifically, satisfactory internal consistency for the WHOQOL-AGE was found in the study by Caballero et al. (15) as well as our study. Moreover, the satisfactory internal consistency was supported by different psychometric methods, including Cronbach's α [0.84-0.91 in Caballero et al.'s study (15); 0.81-0.90 in our study] and McDonald's ω (0.85-0.92 in our study).
Both our study and Santos et al.'s study (16) demonstrate that the bifactor model was the best fitting model with excellent fit for the WHOQOL-AGE. Therefore, we can ensure that the WHOQOL-AGE has promising construct validity and can assess global QoL by using all 13 items. Two factors were found in the WHOQOL-AGE, and Factor 1 items (i.e., Q1-Q8) seemed to share the satisfaction in personal asset concept while Factor 2 items (i.e., Q9-Q13) seemed to share the self-efficacy in activities of daily living concept. Nevertheless, Santos et al. (16) proposed another explanation for why there are two factors in the WHOQOL-AGE: The two factors were constructed because of their response scale (Factor 1 items are rated using bipolar response; Factor 2 items are rated using unipolar response). The different response scales can be considered as a method effect in the factor structure, and a similar type of method effect (e.g., positively and negatively worded items) has been illustrated in other QoL instruments (22,30,31). Therefore, the bifactor model verified that, after tackling the method effects, the WHOQOL-AGE can provide valid estimations of global QoL for older people.
We can further extend the findings from Santos et al.'s study (16) regarding the measurement invariance. Santos et al. (16) show the partial invariance of the WHOQOL-AGE across three European countries (Finland, Poland, and Spain). However, current literature provides no further information on the measurement invariance of the WHOQOL-AGE. Our study, thus, extends the findings of invariance across European countries to across genders, educational levels, living settings, and ages. Gender and educational level are obvious factors that may influence an individual to think differently and may lead to different interpretations of the same item descriptions (2). Therefore, it is important to evaluate whether the WHOQOL-AGE items are interpreted similarly across genders and educational levels. The supported measurement invariance found from our results indicated that neither gender nor educational level influenced the psychometric properties of the WHOQOL-AGE. Hence, the WHOQOL-AGE can be used to compare QoL between males and females and between those with low and high levels of education.
Moreover, apart from living in the community, older people can also live in unique settings; that is, LTC institutions or nursing homes. Given that the living context is different (32), older people may have different perceptions and considerations when they answer the items of the WHOQOL-AGE. Therefore, ensuring the measurement invariance of the WHOQOL-AGE across different living settings is important. The supported measurement invariance found from our results indicates that living context did not influence the psychometric properties of the WHOQOL-AGE. Hence, the WHOQOL-AGE can be used to compare QoL between those living in the community and those living in an institution.
As the measurement invariance was supported for the WHOQOL-AGE across different settings, we tested the knowngroup validity of the WHOQOL-AGE. Our results reveal consistent findings with the literature (28), which shows that older people living in the community had better QoL than those living in an institution. Moreover, our results on known-group validity showed that the WHOQOL-AGE had a moderate effect size in distinguishing QoL between different settings. Therefore, we anticipate that the WHOQOL-AGE would be a sensitive tool for detecting differences in QoL among older people. However, future studies may further investigate whether the WHOQOL-AGE is also sensible for detecting intervention effects.
There are some limitations in this study. First, we did not assess the test-retest reliability of the WHOQOL-AGE. Therefore, the reproducibility and stability of the WHOQOL-AGE remain unknown. Second, all the participants were recruited in Southern Taiwan through convenience sampling; thus, the representativeness is restricted, and the generalizability of our findings is limited. Third, although we tested the known-group validity of the WHOQOL-AGE using living settings, no other external criteria were assessed. Therefore, we are unsure whether the WHOQOL-AGE has satisfactory concurrent validity to support its underlying QoL concept. Future studies are, thus, warranted to investigate this topic using other validated QoL-related instruments (e.g., WHOQOL-BREF). Fourth, the present study did not examine the feasibility, reliability, and responsibility for the WHOQOL-AGE. Because a high-quality QoL instrument needs the information to potentiate its use, future studies are warranted to examine the feasibility, reliability, and responsibility of the WHOQOL-AGE in Asia. Last and importantly, although we have ensured the linguistic validity of the Taiwan version WHOQOL-AGE during the translation process, we did not design culturally specific items for the Taiwan version. Given that the WHOQOL-AGE was developed from European countries and that European and Taiwanese lifestyles are different, the translated WHOQOL-AGE may not be able to detect the QoL for Taiwanese people specifically. Future studies may consider developing and incorporating culturally specific items into the Taiwan version WHOQOL-AGE.

CONCLUSIONS
Our findings may supplement the use of QoL on older people in the current literature. Specifically, WHOQOL-OLD is a wellestablished instrument with strong psychometric properties to assess QoL for older people, and our psychometric findings on WHOQOL-AGE may provide healthcare providers another choice to assess QoL for older people. That is, we may consider using them in different situations. Specifically, the WHOQOL-OLD has more items than does the WHOQOL-AGE; therefore, WHOQOL-OLD can provide more detailed QoL information than does the WHOQOL-AGE. Thus, the WHOQOL-OLD is a good instrument when a user wants to obtain detailed QoL information for older people. In contrast, the WHOQOL-AGE has fewer items and, therefore, can be used when a user wants to quickly obtain the QoL information of older people.
In conclusion, the present study demonstrates the promising psychometric properties of the WHOQOL-AGE in an East Asian sample through convenience sampling. With the strong psychometric properties found in this study, other Asian countries may consider translating the WHOQOL-AGE and examine whether it can efficiently assess QoL for the elderly. Moreover, the supported measurement invariance of the WHOQOL-AGE indicates that it can precisely assess QoL for older people in different conditions, including different genders, educational levels, and living settings. Nevertheless, future studies should consider using a representative sample to examine the psychometric properties of the WHOQOL-AGE (including test-retest reliability and concurrent validity that were not tested in the present study) to gather additional information to corroborate the usefulness of the WHOQOL-AGE in Asia. Specifically, results from the present study can only be generalized to a small portion of Taiwanese older people and additional evidence is needed.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Human Research Ethics Committee of National Cheng Kung University Hospital (A-ER-104-384), and no conflicts of interest between the authors and the goals of this study. The patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

AUTHOR CONTRIBUTIONS
C-YL: data analyses, preparation of the manuscript. J-DW: study design, validation of the WHOQOL-AGE translation version, proof reading of manuscript. L-FL: design of the research, fund raising, data collection, and preparation of the manuscript. All authors contributed to the article and approved the submitted version.