Short Forms of the Cross-Cultural (Chinese) Personality Assessment Inventory: Reliability, Validity, and Measurement Invariance Across Gender

Filling out long questionnaires can be frustrating, unpleasant, and discouraging for respondents to continue. This is why shorter forms of long instruments are preferred, especially when they have comparable reliability and validity. In present study, two short forms of the Cross-cultural (Chinese) Personality Assessment Inventory (CPAI-2) were developed and validated. The items of the short forms were all selected from the 28 personality scales of the CPAI-2 based on the norm sample. Based on some priori criteria, we obtained the appropriate items and constructed the 56-item Chinese Personality Assessment Inventory (CPAI) and the 28-item CPAI. Then, we examined the factor structure of both short forms with Exploratory SEM (ESEM) and replicated the four-factor structure of the original CPAI-2, reflecting the four personality domains of Chinese people, namely, Social Potency, Dependability, Accommodation, and Interpersonal Relatedness. Further analyses with ESEM models demonstrate full measurement invariance across gender for both short forms. The results show that females score lower than males on Social Potency. In addition, these four factors of both short forms have adequate internal consistency, and the correlation patterns of the four factors, the big five personality traits, and several health-related variables are extremely similar across the two short forms, reflecting adequate and comparable criterion validity, convergent validity, and discriminant validity. Overall, the short versions of CPAI-2 are psychometrically acceptable and have practically implications for measuring Chinese personality and cross-cultural research.


INTRODUCTION
Lengthy, time-consuming questionnaires may evoke impatience or frustration in respondents, leading to temporary measurement errors and increasing the likelihood of careless responses, withdrawal from data collection, and refusal to further participation (Schmidt et al., 2003;Donnellan et al., 2006). Consequently, brief measures within the framework of the big five model have become increasingly available and shorter, including the 60-item NEO five-factor inventory (NEO-FFI, Costa and McCrae, 1992), the 44-item Big Five Inventory (BFI, John et al., 2008), the 30-item BFI-2-S and 15-item BFI-2-XS (Soto and John, 2017a), the 20-item Mini International Personality Item Pool (Donnellan et al., 2006), and even the 10-item short version of BFI (Rammstedt and John, 2007). These widely used measures have demonstrated that the short version is sufficient to provide a valuable assessment of personality constructs (e.g., Dale et al., 2020;Perry et al., 2020;Shchebetenko et al., 2020).
However, the big five model has been challenged in terms of cross-cultural adaptability (Cheung et al., 2011;Li et al., 2019;Wang et al., 2019;Dong et al., 2021). As a theory derived in western society, the big five model may include specific traits that are more valued in western societies than in non-western societies (Church, 2001), or it may not include some traits that are more prominent in non-western societies than in western societies. To avoid these blind spots, Cheung et al. (2011) proposed the combined etic-emic approach that can take into account both cultural-specific (indigenous) and cultural-universal personality traits. Using this approach, several forms are developed, namely, the Chinese Personality Assessment Inventory (CPAI, Cheung et al., 1996), the Cross-Cultural (Chinese) Personality Assessment Inventory (CPAI-2, , and the Cross-cultural (Chinese) Personality Assessment Inventory for Adolescents (CPAI-A, . The CPAI measures can serve as omnibus indigenous personality inventories for the Chinese people and as crossculturally valid instruments for people from non-Chinese societies (Cheung et al., 2003;Wada et al., 2004;Born and Jooren, 2009;Iliescu and Ion, 2009;Dang et al., 2010;Cheung et al., 2013). However, there are too few short forms of CPAI measures compared to the prosperity of the brief measures of the big five model, and even only one short form of the CPAI-A has been developed recently (Dong et al., 2021). In the present study, we developed two short forms for the CPAI-2.
To develop the CPAI, researchers explored multiple sources of folk personality descriptions, including contemporary Chinese novels, Chinese proverbs, and psychological research literature. They collected descriptions about oneself from an informal street survey and descriptions about others from surveys of various professionals (Cheung et al., 1996). At the same time, the researchers drew on the existing Western personality measurement literature. The CPAI personality profile were generated from those descriptions with an integrated and balanced treatment of universal and culture-specific aspects, including 22 normal personality scales, 12 clinical scales, and 3 validity scales with a total of 510 items. To date, the CPAI has been developed and repeatedly revised over 20 years, resulting in two versions: an adolescent version (CPAI-A) and an adult version (CPAI-2). The adult version, CPAI-2, consists of 28 normal personality scales, 12 clinical scales, and 3 validity scales with a total of 541 items. The present study focuses on the normal personality scales (Form B).
Explanatory factor analyses reveal that the 28 personality scales of the CPAI-2 reflect four deeper latent domains, namely, Social Potency, Dependability, Accommodation, and Interpersonal Relatedness (IR; , which are identical to the structure of the original CPAI personality scales. Of particular note is that the IR factor contains more indigenous elements in Chinese culture, such as paying attention to reciprocity in the relationship, avoiding face-to-face conflict, maintaining superficial harmony, and saving face for everyone, which highlights the attitudes, beliefs, and behavioral patterns of how Chinese people "behave" in instrumental interpersonal relationships. In a joint factor analysis of the CPAI and the NEO PI-R, IR did not load on any of the NEO PI-R factors (Cheung et al., 2001). In another joint analysis of the CPAI-2 and the NEO-FFI, IR was again distinct . That is to say, IR is juxtaposed with the five personality traits defined in the big five model, resulting in a "big six" personality structure. At the same time, Social Potency, Dependability, and Accommodation were intertwined with the big five personality traits in these joint factor analyses, showing more cultural-universal characteristics.
The four-factor structure of the CPAI and CPAI-2 has been replicated in several English-speaking groups, including Singapore Chinese adults and Caucasian American college students (Cheung et al., 2003), Chinese Americans and European Americans (Lin and Church, 2004), and a mixed Singapore sample including Chinese, Malays, and Indians (Cheung et al., 2006). Similarly, the big six personality structure has been found in English-speaking groups, including Hawaiian Students and Chinese Singaporeans through joint factor analysis (Cheung et al., 2001(Cheung et al., , 2003. These findings suggest that the IR factor may also be present in the personality structure of Westerners. To date, CPAI-2 has been translated into five languages other than English, including Japanese (Wada et al., 2004), Korean (see Cheung et al., 2013), Vietnamese (Dang et al., 2010), Dutch (Born and Jooren, 2009), and Romanian (Iliescu and Ion, 2009). Factor analysis of these translations showed that IR can still be established independently. These findings prompted researchers to consider the cross-cultural validity of the CPAI-2 and to rename it the Cross-cultural (Chinese) Personality Assessment Inventory.
In addition to the structural cross-cultural comparisons, comparisons of group means revealed significant differences across cultures and genders Lin and Church, 2004). One study reported cultural mean differences on the CPAI-2, with less acculturated Asian Americans scoring higher on the IR compared to more acculturated Asian American and European American participants (Lin and Church, 2004). Another study reported gender differences, with males scoring higher on most scales of the Social Potency factor and some Frontiers in Psychology | www.frontiersin.org scales of the Dependability factor and females scoring higher on some scales of the Dependability factor, Accommodation factor, and Interpersonal Relatedness factor . We can improve the comparison of group means on the CPAI measures by addressing the following two issues. Firstly, Domain-level gender differences of CPAI-2 remained unrevealed. Secondly, all these mean score comparisons were conducted without establishing measurement invariance (MI) across groups, which results in mean differences that cannot be directly explained (Cheung and Rensvold, 2002).
For more than two decades, a series of studies have been conducted with the CPAI-2, highlighting its value in predicting important aspects of people's lives, including adolescent life satisfaction (Ho et al., 2008;Xie et al., 2016), adolescent loneliness , career exploration of university students (Fan et al., 2012), personal decision-making style (Gan et al., 2019), urban entrepreneurial dynamism (Obschonka et al., 2019), and so on. In these studies, indigenous personality traits, such as IR, demonstrated additional predictive power. More empirical studies are needed to examine the role of CPAI-2 in understanding and predicting human behavior cross cultures.
The 28 personality scales of CPAI-2 have a total of 298 items that takes about half an hour to finish, a time long enough to provoke impatience and eliminate the capacity of other variables, limiting the application of the CPAI-2. Thus, the present study aimed to develop two short forms for the CPAI-2: the 56-item CPAI and the 28-item CPAI. The former took two items from each of the 28 personality scales, with the aim of reducing the number of items and retaining a certain degree of hierarchical measurement suitable for both domainlevel measurement and scale-level measurement. The latter removes one of the two items and saves more time, though it suffers from the loss of hierarchical measurement. That is, the former retains a certain degree of hierarchical measurement, while the latter is more time efficient and suitable for studies where time of assessment and respondent fatigue are the core questions. Table 1 demonstrates the item numbers of each scale for the original CPAI-2, the 56-item CPAI and the 28-item CPAI.
When developing the 56-item CPAI and the 28-item CPAI, we tried to make both short forms retain the same hierarchical structure as the original CPAI-2 and maintain adequate reliability and validity. As for the structure, we wanted the short forms to reflect the four distinct domains, each with the same content bandwidth as the CPAI-2. The way we selected items ensured that the short forms would completely cover the content of the CPAI-2 and retain the original structure at the scale-level. To obtain adequate reliability and validity, we used a combination of empirical and rational criteria. Empirically, authors familiar with the CPAI-2 were responsible for item selection based on their conceptual judgment regarding the extent to which the content of the selected items represented the overall meaning of their underlying traits. Rationally, we tended to select or retain items with higher factor loadings, less cross-loading problems and items that contribute to higher alpha coefficients for domains. More importantly, we wanted to demonstrate that the short forms did perform well in terms of these psychometric qualities. In general, we had three goals in present study.
Firstly, as mentioned above, we selected items from the 28 personalities scales of the CPAI-2 for the two short forms based on some priori criteria. Secondly, we investigated the four-factor structure of the short forms and tested their measurement invariance across gender. Finally, we tested the criterion validity of the short forms by examining the relationship between the factors of the short forms and several important variables. Figure 1 demonstrated the workflow of this study.

Participants
The study analyzed data from 2 samples: one for item selection and construct validity and one for criterion validity. A

CPAI-2
We used the traditional Chinese version of the CPAI-2 in this study. The original CPAI-2 uses a true-false rating scale, while the two short versions use a 5-point Likert scale. That is, respondents were asked to rate each statement depicting personal characteristics or typical behaviors describing their personality, from 1 (least) to 5 (most).

TIPI-10
The TIPI-10 is a self-rated questionnaire containing 10 items, each on a 7-point Likert scale (1 = disagree strongly, 7 = agree strongly). A study showed that TIPI-10 can used as a reliable and effective instrument to measure the Big Five Personality in a Chinese sample (Li, 2013).

Patient Health Questionnaire
The PHQ-9 was used to assess the severity of depressive symptoms over a two-week period. The scale includes nine items on a 4-point Likert scale (1 = not at all, 4 = nearly every day). The higher the total score, the more severe the depressive symptoms. A previous study indicated that PHQ-9 has good psychometric qualities in Chinese samples (Wang et al., 2014). Cronbach's α for the PHQ-9 in this study was 0.906.

Generalized Anxiety Disorder Screener
The GAD-2 consists of two core criteria for generalized anxiety disorder. The scale uses a 4-point Likert scale ranging from 1 (not at all) to 4 (nearly every day). The higher the total score, the more severe the generalized anxiety disorder. Cronbach's α in this study was 0.840.

General Health Questionnaire
The GHQ comprises 12 items with a 4-point response scale ranging from "rarely" to "almost always. " The total score was used to indicate the severity of mental health problem. The higher the total score, the more serious the mental health problem. Cronbach's α was 0.898 for the GHQ in the present study. In addition, the GHQ consists of three sub-dimensions: social dysfunction, anxiety, and loss of confidence (Graetz, Frontiers in Psychology | www.frontiersin.org 5 December 2021 | Volume 12 | Article 709032 1991). Cronbach's α was 0.878, 0.786, and 0.856 for the three sub-dimensions, respectively.

Subjective Well-Being Scale
The scale has only one question and comprises seven faces, ranging from 1 (very happy) to 7 (very sad). Specifically, participants should determine which face is closest to their overall life experience and select the appropriate option. The happier the picture the participant chose, the higher their overall level of subjective well-being.

Data Analysis
The analyses were performed with Mplus 8.4 Muthén, 1998-2017) and SPSS20.0 software. Mplus 8.4 was used to test the structure and measurement invariance of the short forms, while SPSS20.0 was used to calculate the alpha coefficients, conduct t tests, and test criterion validity. We mainly used Exploratory SEM (ESEM) rather than CFA to explore factor structure, correlations among factors and measurement invariance across gender for the two short forms. The CFA models require that cross-loadings of items be set to zero, a limitation that may lead to two problems (Asparouhov and Muthen, 2009;Marsh et al., 2014). Firstly, it is almost impossible for item-level CFAs to get an acceptable fit (e.g., CFI, TLI > 0.9; RMSEA < 0.05) for instruments that are well established in EFA research. Secondly, the factor correlations in CFA are likely to be positively biased, sometimes substantially so. Marsh et al. (2014) regarded ESEM as an overarching integration of the best aspects of CFA and EFA, since ESEM can perform almost all the functions of CFA and is immune to both problems. Previous studies on the BFI and other FFA measures have demonstrated that, compared with CFA models, ESEM models have a better fit, smaller factor correlations, and almost identical factor loadings (Marsh et al., 2010;Chiorri et al., 2016).
The analysis of ESEM used a robust maximum likelihood estimator with standard errors and fit tests that were robust concerning the non-normality of the observations Muthén, 1998-2017). As done by Marsh et al. (2010), we used an oblique GEOMIN rotation (the default in Mplus) in ESEM. Related material is available at the Open Science Framework https://mfr.osf.io/render?url=https%3A%2F%2Fosf. io%2Fg359z%2Fdownload.

Correlated Uniquenesses
Following Chiorri et al. (2016), we included ESEM models with and without a priori correlated uniquenesses (CUs; covariances between specific variance components associated with two different items of the same CPAI scale).
The model fit can be improved by freeing the correlations among error covariances (CUs) of some items, a strategy that is legitimate only in limited case that these items have further common variance beyond that explained by the specified latent factors (Marsh et al., 2010). The common variance beyond those caused by a common factor may result from a common method (e.g., Marsh et al., 1992), similar item wording (e.g., Chiorri et al., 2016) or "specific" factors that are independent of the "general" factor (e.g., Marsh et al., 2010Marsh et al., , 2013Chiorri et al., 2016). Marsh et al. (2010) posited that items from the same facet of a specific Big Five factor have higher correlations than items from different facets of the same Big Five factor. They claimed that inflated correlations could be divided into those could be explained in terms of the common Big Five factor and those could track back to the same facet and suggested modeling the correlations due to facets as CUs by freeing the correlations among error covariances of each pair of items from the same facet, an approach that always leads to a considerable increase in model fit (Marsh et al., 2013;Chiorri et al., 2016).
The four deep domains of the CPAI-2 consist of 28 personality scales, of which 8 scales are Social Potency, 9 scales are Dependability, five scales are Accommodation, and six scales are Interpersonal Relatedness (see Table 1). We selected the same number of items from each scale to construct the short forms of the CPAI in an attempt to maintain the hierarchical structure of the original CPAI and to avoid the "bandwidthfidelity dilemma" (Cronbach and Gleser, 1957). Thus, there are 28 pairs of items in the 56 item CPAI, and each pair comes from the same scale, in which case a priori set of 28 CUs should be included in the four-factor model of the 56-item short form to cope with correlation inflation due to shared scales. We also set another CU to attain an adequate model fit, a CU due to a wording effect rather than from the same scale. That is, we specified a priori set of 29 CUs in total. Marsh et al. (2014) recommended a 13-model taxonomy of invariance tests that can be conducted within an ESEM framework. According to the 13 models, we applied increasingly stringent equality constraints on the measurement parameters between male and female participants.

Measurement Invariance Models
Four particularly noteworthy levels of invariance, from least to most strict, were configural, weak, strong, and strict invariance (Meredith, 1993). Configural invariance specifies the same number of factors with same items across groups and does not require any estimated parameters to be the same. It serves as a baseline for comparing other models that impose equality constraints on the parameters across groups. The ability of the configural invariance model to fit the data must be tested. The weak invariance model requires that factor loadings to be invariant across groups. Strong invariance model constrains both factor loading and intercepts (indicator means) to be equal across groups. If the strong invariance model is supported, the changes in the latent factor means can be reasonably interpreted as changes in the latent constructs. However, strong invariance is a necessary, but not sufficient, condition for testing manifest group mean differences. The differences in item reliability across groups will distort the observed mean differences in scores. The strict invariance model is sufficient because it adds a constraint of invariant residual variances (item uniquenesses) to strong invariance, indicating that item reliability is invariant.
The taxonomy of 13 models also includes invariance of the latent means and of the factor variance-covariance matrix. The former assumes at least strong invariance and sets the factor means to zero in both groups, while the latter assumes at least weak invariance and adds constraints on invariant factor variances and covariances. Marsh et al. (2010) recommend the following fit indices independent of sample size: the comparative fit index (CFI), the Tucker-Lewis index (TLI), the root mean square error of approximation (RMSEA), and the significance of parameter estimates. We also reported the robust chi-square test statistic, a fit index very sensitive to sample size. For the RMSEA, values less than 0.08 and 0.05 are considered as acceptable and optimal fits, respectively. For the CFI and TLI, values greater than 0.90 and 0.95 are considered as acceptable fits and optimal fits, respectively (Marsh et al., 2004).

Goodness of Fit
We used the change in CFI (ΔCFI) and the change in RMSEA (ΔRMSEA) to compare the relative fit of two nested invariance models. ΔCFI less than 0.01 or/and ΔRMSEA less than 0.015 supports a more parsimonious model and provides evidence of invariance at the given level (Cheung and Rensvold, 2002;Chen, 2007). In addition, if TLI or RMSEA is as good as or better than the more complex model, the more parsimonious model is supported, which is a relatively conservative guideline (Marsh, 2007).

Developing the 56-Item CPAI
We created a 56-item CPAI by selecting two items from each of the 28 personality scales of CPAI-2. The item selection process has two stages. The first stage applied empirical criteria, while the second stage applied rational criteria. In the first stage, two authors independently selected two items from each of the 28 personality scales based on their own conceptual judgments regarding the extent to which the content of the items represents their underlying traits. If they selected different items from the same scale and could not come to an agreement, then all selected items were retained for screening at the next stage. In this stage, we got 68 items with 9 scales having more than 2 items because of disagreement, including Internal vs. External Locus of Control (I_E, 4 items), Responsibility (Res, 3 items), Self vs. Social Orientation (S_S, 3 items), Traditionalism vs. Modernity (T_M, 3 items), Ren Qing (Ren, 4 items), Social Sensitivity (Soc, 3 items), Discipline (Dis, 3 items), Harmony (Har, 4 items), and Thrift vs. Extravagance (T_E, 3 items).
In the second stage, we collected data of the 68 items on 5-point Likert scale and used domain-level alpha coefficient as criteria to reduce items. For Social Potency, items reduction was not necessary because none of the 8 scales had more than 2 items. For Dependability, 2 of the 9 scales had more than 2 items, that is, I_E and Res had 4 and 3 items, respectively. Then, there were 6 possible solutions for selecting two items from I_E and 3 from Res. We combined each of the 4 items selected from I_E and Res with items from other scales of Dependability and got 18 possible combinations (6*3) in total. The alpha coefficients of these combinations were from 0.821 to 0.831, with an average of 0.826. Finally, we chose the combination with the highest alpha coefficient as the final version of Dependability for the 56-item CPAI.
Using the same procedure, we got the final version of Accommodation and Interpersonal Relatedness. The alpha coefficients of the 3 combinations of Accommodation were from 0.676 to 0.768, with an average of 0.712. As for Interpersonal Relatedness, there were 1,458 combinations (3*4*3*3*3*3). The alpha coefficients for these combinations were from 0.644 to 0.724, with an average of 0.681. It is worth noting that the two authors later thought that an item in harmony scale was inappropriate in content, so only three items were left to select from. The combinations of items and their alpha coefficients are available on the Open Science Framework at https://mfr. osf.io/render?url=https%3A%2F%2Fosf. io%2F2pev3%2Fdownload.
Thus, we got the highest alpha coefficients of each domain and the final version of the 56-item CPAI. The alpha coefficients are 0.822 for Social Potency, 0.831 for Dependability, 0.768 for Accommodation, and 0.724 for Interpersonal Relatedness, with an average of 0.786. The item-total correlations of each item with the domain to which it belongs range from 0.271 to 0.649, with only one below 0.40 and an average of 0.522.
We also conducted Velicer's minimum average partial correlation procedure to determine the number of components of the 56 items. When the fourth component was extracted, the average squared partial correlation reached a minimum value of 0.0031, a result that supports a four-factor solution (Velicer, 1976).
Then, we conducted ESEM to test the four-factor model of the 56-item CPAI. These analyses included models with and without CUs. As shown in Table 2, only the ESEM model with CUs provides an adequate fit. Table 3 demonstrates the standardized factor loadings, item-total correlations, and factor correlations. Factor loadings tend to be modest. Target loadings of the ESEM model range from 0.15 to 0.615, with six loadings below 0.30 and a median of 0.408. Cross-loadings in the ESEM model range from −0.323 to 0.366. Almost 80 percent of them (134 out of 168) are statistically different from zero. Five crossloadings are higher than 0.30, and five items have cross-loading higher than the target loading. R-squares of items range from

Developing the 28-Item CPAI
We created a 28-item CPAI by dropping one of the two items selected from each of the 28 personality scales. The two criteria for deleting items were both based on the ESEM solution of the 56-item CPAI. That is, items with low factor loading or worse cross-loading problems would be dropped. Worse crossloading problems included more cross-loadings on one item and the absolute values of cross-loading higher than or closer to that of the target loading. Thus, we got the 28-item CPAI. For this even shorter version of CPAI-2, the alpha coefficients are 0.710 for Social Potency, 0.760 for Dependability, 0.609 for Accommodation, and 0.590 for Interpersonal Relatedness, with an average of 0.667. The item-total correlations of each item with the domain to which it belongs are from 0.509 to 0.671, with an average of 0.584 (See Table 4). The ratio of the mean alpha reliability is 0.849 for the 28-item CPAI compared to the 56-item CPAI. In addition, we regarded the 28-item CPAI as a part of the 56-item CPAI and computed the partwhole correlations for each domain. The part-whole correlations are 0.939 for Social Potency, 0.940 for Dependability, 0.919 for Accommodation, and 0.900 for Interpersonal Relatedness, with an average of 0.925. The mean of the squared part-whole correlations is 0.855. These results suggest that the 28-item CPAI is about 15% less reliable than the 56-item CPAI.
As shown in Table 2, the fit of the ESEM model is acceptable. Table 4 demonstrates the standardized factor loadings, itemtotal correlations, and factor correlations of the ESEM model. Factor loadings are modest. Specifically, the target loadings of the ESEM model range from 0.266 to 0.660, with only one loading less than 0.30 and a median of 0.446. The cross-loadings in the ESEM model range from −0.334 to 0.289. More than 80% of them (70 out of 84) are statistically different from zero. Only one cross-loading is higher than 0.30, and none of the items has a cross-loading higher than the target loading. Factor correlations range from −0.413 to 0.174 with a median of −0.051.

Measurement Invariance Across Gender
We conducted multiple-group ESEM to test the measurement invariance of the 56-item CPAI and the 28-item CPAI across gender. We first tested the 13 models of the 28-item CPAI and labeled them with the letter A in Table 5. As for the 56-item CPAI, we tested two sets of the 13 models, one in which the CUs were allowed to vary for females and males and another in which the CUs were constrained to be invariant over responses by females and males. We labeled the former with the letter B and the latter with the letter C in Table 5.
In general, we conducted three sets of measurement invariance tests: set A, set B and set C.

Configural Invariance
The goodness of fit statistics provides adequate support for the configural invariance models (Model 1A, Model 1B, and Model 1C), with all of the TLI and CFI exceeding 0.90 and all of the RMSEA below 0.05.

Weak Invariance
When factor loadings were constrained to be equal across gender, the TLIs and the RMSEAs are even better than those

Strong Invariance
The strong invariance models constrain both factor loading and item intercepts to be equal across gender. The fit statistics do not reject invariant intercepts hypothesis, with the ΔCFIs below 0.01 and the ΔRMSEAs below 0.015 in model 3A, 3B, and 3C.

Strict Invariance
The strict invariance models require equal factor loadings, item intercepts, and uniquenesses across gender. When compared with models 5A, 5B, and 5C, the corresponding models 7A, 7B, and 7C do not produce substantial changes in TLI, CFI, and RMSEA. We also compared all the other various pairs of models (

Factor Mean Invariance
We tested factor mean invariance across gender by comparing four pairs of models: M10 vs. M5, M11 vs. M7, M12 vs. M8, and M13 vs. M9. What these four models (M10-M13) have in common is that they all have factor means constrained to zero for both male and female groups. The results show that all changes in model fit indices do not exceed the cut-points to reject the invariant factor means hypothesis. However, in the test of set A for the 28-item CPAI, the differences in fit indices only marginally support invariance. Changes in both CFI and TLI exceed 0.005, with ΔCFIs equaling to 0.006 and changes in TLI equaling to 0.007. We could explain gender differences in terms of latent means with sufficient justification since there had been reasonable support for the strict invariance over gender. Thus, we examined models in which means were constrained to 0 for the male group and freely estimated for the female group. It was apparent that females yielded significantly higher scores on Dependability, Accommodation, and Interpersonal Relatedness and lower scores on Social Potency. Table 6 presents a summary of the standardized gender differences based on the four models that provided estimates of these differences.
We also performed independent sample t tests to examine gender differences in the four factors of both short forms and found the same pattern as the multi-group ESEM results. Females scored higher than males on Dependability, Accommodation, and Interpersonal Relatedness, but lower on Social Potency. However, except for differences in social competence, the effect sizes for most gender differences are very small and of little practical significance. These results can explain why the factor mean invariance could be established in multi-group ESEM analyses. Table 7 demonstrated a summary of the t test statistics.

Criterion Validity
We tested the correlations between the four domains of both short forms and several criterion variables, including the big five factors and several health-related variables. As shown in Table 8, the pattern of criterion associations is very similar across the two short forms. We calculated the correlations between the two columns of the criterion associations for each domain of the CPAI. We found correlations of 1.000 for Social Potency, 1.000 for Dependability, 0.999 for Accommodation, and 0.993 for Interpersonal Relatedness, suggesting that the 28-item CPAI is almost identical with the 56-item CPAI in terms of the relationship between the domains and the criterion variables.
The four domains of both short forms are significantly correlated with almost all big five factors. Specifically, Social Frontiers in Psychology | www.frontiersin.org As for health-related variables, the four domains are significantly correlated with PHQ (ranging from −0.294 to −0.555), GAD (ranging from −0.246 to −0.565), GHQ (ranging from −0.374 to −0.607), social dysfunction (ranging from −0.315 to −0.540), anxiety (ranging from −0.223 to −0.507), loss of confidence (ranging from −0.266 to −0.478), and subjective well-being (ranging from 0.270 to 0.390). Among them, Dependability has relatively strong correlations with PHQ, GAD, GHQ, Social dysfunction and Anxiety, Social Potency   has relatively strong correlation with Social dysfunction, and IR is not correlated with PHQ and anxiety.

DISCUSSION
The CPAI-2 is a promising instrument in the fields of personality psychology and cross-cultural psychology. However, shortages of short forms may slow down its progress in these fields. In the present study, we developed two short forms with sound psychometric qualities for the CPAI-2: the 56-item CPAI and the 28-item CPAI. Then, we examined the extent to which these short forms retain the structure of the CPAI-2 and their measurement invariance across gender. It turns out that they both share the same four-factor structure of the CPAI-2, and the four factors appear to be distinct from each other. Both short forms demonstrate strict invariance across gender. Further tests show that men scored higher than women on social competence. In addition, both short forms have adequate reliabilities and validities.
In the present study, we provided alpha coefficients for each domain of the two short forms and examined the relationship between CPAI domains and several criterion variables. Among the four domains, Accommodation and Interpersonal Relatedness are the two domains with relatively low internal consistency in both short forms. In the 56-item CPAI, the alpha coefficients of the four domains are all higher than 0.7, indicating adequate internal consistency (Nunnally and Bernstein, 1994). When reducing the number of items by half to construct the 28-item CPAI, the alpha coefficients decreased in all four domains, with lower internal consistency in two of them, dropping below 0.70. That is to say, in the 28-item CPAI, Accommodation and Interpersonal Relatedness seems to be weak in internal consistency, especially Interpersonal Relatedness.
The way we constructed the short forms prioritizes high bandwidth over high internal consistency. We selected the same number of items from each of the 28 CPAI-2 scales so that each domain of the short forms would cover all of its aspects in the original CPAI-2, a strategy that resulted in a relatively high level of item content heterogeneity in each domain. Item content heterogeneity refers to whether the items in a scale cover many different aspects of one trait or focus on only a few (McCrae et al., 2011). The high item content heterogeneity can lead to low internal consistency. For example, Interpersonal Relatedness consists of six diverse aspects. In the 56-item CPAI, there are two items per aspect, whereas in the 28-item CPAI, there is only one item per aspect. Thus, the former is less heterogeneous because it has a peer that reflects the same aspect in each item. Interpersonal Relatedness is weaker than other domains in terms of internal consistency, probably also because the aspects that make it up are more heterogeneous in terms of content.
We placed more emphasis on validity than on internal consistency reliability. Low internal consistency caused by item content heterogeneity may not lead to low validity (McCrae et al., 2011). In terms of validity, the 28-item CPAI does not appear to be worse than the 56-item CPAI according to the correlation pattern between the four domains and those criterion variables. Specifically, the four domains of both short forms are positively correlated with subjective well-being and negatively correlated with variables indicating poor mental health, and Dependability seemed to be the most potent protector of health among them.
In addition, domains of short forms are widely related to the big five personality traits. The way they correlated with the big five traits is quite similar to the way the scales of CPAI domains are entangled with the facets of the big five factors in previous joint factor analyses (Cheung et al., 2001. For example, Scales of Dependability mainly combined with facets of Neuroticism (Emotional Stability) and Conscientiousness in previous joint factor analyses of CPAI measures and big five measures. Then, in the present study, the Dependability of both short forms was apparently more strongly correlated with Emotional Stability and Conscientiousness. The short forms are in excellent consistency with the original CPAI measures regarding their relationship with the Big Five personality factors.
In addition, the correlation pattern of CPAI domains and the big five factors provides evidence of convergent and discriminant validity from a multi-trait-multi-method perspective. The big five and CPAI measures are developed with different approaches, the former uses an etic approach, while the latter uses a combined etic-emic approach. However, the personality traits they measured overlap. Dependability overlaps with Emotional Stability and Conscientiousness, Social potency overlaps with Openness and Extraversion, and Accommodation overlaps with agreeableness. These overlaps are reflected in previous joint factor analyses and are again demonstrated in these correlations in present study. The correlations between one CPAI domain and the big five factors overlapping with it are much higher than those between the domain and other CPAI domains.
The short forms do offer substantial savings in assessment time compared to the full CPAI-2. According to Soto and John (2017a), the 60-item BFI-2 takes 4 to 10 min to complete, and the 30-item BFI-2-S takes 3 to 5 min. The 56-item CPAI and the 28-item CPAI have about the same number of items as BFI-2 and BFI-2-S, respectively. Thus, we can infer from their estimates of the time required to complete the 56-item CPAI (4 to 10 min) and the 28-item CPAI (3 to 5 min). When using the short form of the CPAI-2, the time would shrink from half an hour to less than 10 min, a decrease that would allow more time for other variables or substantially reduce the likelihood of fatigue and impatience. This is why short forms are preferred over the full version, especially when they have comparable reliability and validity.
However, the efficiency gains in short forms often come at the cost of reliability and validity (Soto and John, 2017a), meaning that short forms need larger samples to maintain the same statistic power as the full CPAI-2. The cost of short forms also includes weakening or even losing hierarchies.
The full CPAI-2 is appropriate for both domain-level and scalelevel personality assessment, a hierarchical assessment that combines the benefits of high bandwidth with high fidelity (Soto and John, 2017b). The 56-item CPAI retains to some extent the capability to assess personality hierarchically and is only appropriate for scale-level assessment in very large samples. The 28-item CPAI, however, lacks the capacity to assess scale-level personality traits.
Thus, it is easy to choose between the CPAI-2 and the 56-item CPAI, but not between the 56-item CPAI and the 28-item CPAI. Compared to the CPAI-2, the 56-item CPAI allows a time advantage of more than 20 min, but with a slight attenuation in psychometric qualities and the capacity of hierarchical measurement. It seems to be worth it. However, it would not be worthwhile to replace the 56-item CPAI with the 28-item CPAI to save less than 7 min at the cost of weakened reliability and loss of hierarchical measurement ability. As advised by Soto and John (2017a), the 28-item CPAI is suitable for studies in which assessment time and respondent fatigue are core concerns, and even small gains in efficiency are critical.
We conducted multi-group ESEM analyses to test the measurement invariance of the two short forms in a comprehensive taxonomy of invariance models with appropriate tests of full measurement and structural invariance. The results support configural invariance across gender and invariance of factor loadings, item intercepts and uniquenesses, correlated uniquenesses, factor variances and covariances, and factor means for both short forms. At the level of measurement invariance, strict gender invariance has been established which implies that the two short instruments are comparable between men and women in the structural level, including factor variance and covariance, and factor mean.
The invariance of the factor covariance indicates that the correlation pattern among the four factors is the same between males and females. Thus, we can expect the short forms will have the same discriminant and convergent validity when applied to different gender groups. Factor mean invariance across gender indicates that there is no gender difference in the four factors. However, the results of the t test show significant gender differences with small effect sizes. These two results are not really contradictory because most of the effect sizes of gender differences are too small to be of any practical significance, except for the gender differences in social competence. Men scored higher than women on social competence, with a small but not negligible effect size, a result that is consistent with the findings on scale-level gender differences on the personality traits of the CPAI-2 . Cheung et al. (2004) also found that males scored higher than females on some scales of dependability, while females scored higher than males on other scales of dependability. Such scale-level differences offset each other on domain-level, explaining why gender difference is trivial and negligible on dependability.
Previous studies on the structure of the CPAI used traditional EFA approaches that could only provide a crude comparison across groups (Cheung et al., 2003;Lin and Church, 2004). Lin and Church (2004) conducted the CFA to test the structure of CPAI scales and NEO-FFI facets and found the CFA model did not fit the data well. Thus, we believe that the best option currently available for performing measurement invariance analysis for CPAI instruments is the ESEM models. We have now provided a basis for cross-sex comparisons of the short forms of CPAI through the ESEM models. In the future, we will use these models for research that compare personality traits of CPAI across different cultures.

CONCLUSION
The work reported here provides two short forms for CPAI-2. Both of them are time efficient, gender invariant, and have adequate validity. One has 56 items and the other 28 items. The former retains a certain degree of the capacity of hierarchical measurement, and the latter is more timesaving. Henceforth, we have the flexibility to choose different versions of CPAI depending on the study. In addition, the present study provides new evidence for the advantages of ESEM and reveals its potential applicability in future studies on CPAI.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
This study was carried out in accordance with the recommendations of the Ethics guidelines of the Ethics Committee of Institute of Psychology, Chinese Academy of Sciences. The study protocol was approved by the institutional review board: Ethics Committee of Institute of Psychology, Chinese Academy of Sciences (reference number: H20020).

AUTHOR CONTRIBUTIONS
FC, JZ, MZ, and FR had the initial ideas. JZ, MZ, FC, and FL collected the data. MZ, DH, and FL analyzed the data. MZ and DH wrote the drafts and the final manuscript. JZ, WF, and WM reviewed the several drafts of the manuscript. MZ and DH revised the manuscript. All authors approved the final version of the manuscript.

FUNDING
This work was funded by the National Natural Science Foundation of China (grant no. 71774156) granted to MZ.