Comparison of Trust Assessment Scales Based on Item Response Theory

Three widely used interpersonal trust measurement scales [Interpersonal Trust Scale (ITS), Philosophies of Human Nature Scale (RPHNS), Company Trust Scale (CTS)] have seldom been applied in non-Western contexts. Different social environments may lead to variation in the level or structure of trust. Therefore, it is necessary to compare the applicability of these scales to different levels of trust-related traits in Eastern cultures so that researchers can choose appropriate scales for relevant studies. This study attempted to conduct a comparative analysis of the ITS, RPHNS, and CTS. A sample of 725 Chinese college students was analyzed. Total score correlations and latent factor correlations estimated by confirmatory factor analysis (CFA) for a first-order three-factor model were assessed, and then the quality of the item parameters, test reliability and standard errors, and test information were assessed. The results are as follows: (1) the ITS and the RPHNS assessed almost the same trust traits; therefore, only the ITS and the RPHNS are compared in the next sections; (2) the original structure of only the RPHNS is verified; (3) some items on the ITS do not work well, while the RPHNS has higher overall test reliability; and (4) the average item information provided by the RPHNS is higher across all trait levels. In most cases, the RPHNS is the better choice in the Chinese cultural context.


INTRODUCTION
Trust refers to a positive psychological expectation that an individual holds toward the behavior and purpose of someone he/she meets during his/her interactions with others or the social environment (Zhao et al., 2013). Research has proven that trust serves as a prerequisite for a sound relationship in social interactions (Righetti et al., 2011). In a cooperatively interactive group activity, trust is conducive to the consolidation of solidarity between group members and the enhancement of group performance (Stolle et al., 2008;Chen et al., 2010). In politics, trust is also one of the decisive factors that determines whether people support a public policy (Zhang et al., 2014). Economically, trust is helpful in simplifying transaction procedures and reducing transaction costs. However, trust may be the major cause of being tricked or duped (Shi et al., 2015). Studies have shown that there are relatively close relationships between interpersonal trust and personality, ego, depressive emotions, and Internet addiction among college students (Xin and Zhou, 2012;Xu T.J. et al., 2017).
In studies on interpersonal trust, widely used measuring tools include the Interpersonal Trust Scale (ITS) developed by Hochreich and Rotter (1970) based on social learning theory, the Philosophies of Human Nature Scale (RPHNS) revised by Wrightsman (1964), and the Company Trust Scale (CTS) created by Hunt et al. (1983). Although these three scales are widely used in Western countries, studies on their reliability and validity in a Chinese context are relatively few and insufficient (Jian, 2007). Additionally, changes in the social environment influence the level and structure of trust (Yang and Peng, 1999). Therefore, it is necessary to determine the factor structure and psychometric characteristics of these scales and to compare their applicability to different trust properties while accounting for the current period and the Chinese cultural context. This approach will allow researchers to choose appropriate scales and conduct related research in contemporary Chinese cultural contexts.
The aforementioned three scales were all created based on classical test theory (CTT). However, recent years have seen the rise of item response theory (IRT) and its technology. IRT has been used to evaluate the psychometric characteristics of three types of depression assessment scales (Adler et al., 2012;Umegaki and Todo, 2017). Compared with CTT, IRT has the following excellent properties. First, the category response threshold (parameter b) of the item and the trust property of the subject use the same metric system. Second, the results of different experiments with the same psychological traits can be compared (Luo, 2012). Third, through item parameter estimation, IRT can directly and accurately reflect the experimental characteristics of each item. Moreover, the application trend of the scale for different features can be demonstrated via a reliability curve and an average item information curve (Olino et al., 2013). Thus, the purpose of this study is to compare the psychometric characteristics of the aforementioned three scales via IRT technology and to make several suggestions for the application of the scales.

Participants
Students from four universities in Nanchang City completed the questionnaires. A total of 725 valid questionnaires were collected. The age of the subjects ranged from 17 to 23 years (M = 19.16, SD = 1.184). A total of 37.1% of the respondents were male, and 62.9% were female.

Measures
The Chinese versions of the three trust assessment scales are applied in this study (revised edition; Wang et al., 1999).
The Chinese version of the ITS has 25 items with two dimensions: trust in relatives and friends and trust in people who have no direct relation. The aim is to measure subjects' judgment of the reliability of others' words and behavior. Scores are given on a five-point scale. There are 12 positive items and 13 negative items.
The Chinese version of the RPHNS has 20 items with two dimensions: trustworthiness and cynicism. Scores are given on a six-point scale ranging from −3 to 3. For the convenience of the IRT analysis, scores of one to six are given in this research. There are 10 positive items and 10 negative items.
The Chinese version of the CTS has 18 items with three dimensions: dependability, predictability, and reliability. The goal is to measure the degree to which intimates trust each other. Scores are given on a seven-point scale. There are 9 positive items and 9 negative items.

Analysis
(1) Common method bias (CMB) test. Before analyzing the psychometric characteristics of the scales, Harman's singlefactor test was used to determine whether CMB existed (Zhou and Long, 2004).
(2) Analysis of trait congruency. To ensure that the same psychological properties were measured by all three scales, correlations among the total scores of the three scales were analyzed. On the basis of the three scales, a higher-order model included all dimensions in the three scales, treating a single scale as a second-order structure. In addition, confirmatory factor analysis (CFA) was employed to assess the latent correlation among all the potential factors to verify the congruency of the scales in terms of psychological properties (Umegaki and Todo, 2017). (3) Construct validity analysis. These scales are widely used in the West and the East (Guinot et al., 2014;Jin et al., 2017). However, religious beliefs and social class problems affect these scales, meaning that they may not be adequately applicable in the Chinese cultural context. Moreover, few studies have explored the localized structure of these scales in China. Therefore, CFA was adopted to verify the original structures of the scales. If the original structure was not verified, exploratory factor analysis (EFA) was performed. Three fit indices, i.e., the comparative fit index (CFI), standardized root mean square residual (SRMR), and root mean square error of approximation (RSMEA), were employed for assessment purposes in both analytical methods. (4) Analysis of item parameters and test information under the guidance of IRT. The common IRT multilevel score models are as follows: the generalized partial credit model (GPCM) (Muraki, 1992), the graded response model (GRM) (Samejima, 1969), and the generalized rating scale model (GRSM) (Masters, 1982). With the aim of selecting a model with a good fit to the test data, indices such as the Akaike information criterion (AIC), Bayesian information criterion (BIC), and −2 × Log-Lik were used to compare the fit precision of the scales under different IRT models. Once the model was determined, the psychometric characteristics of the scales were further investigated within the IRT framework. In addition, the scales' item parameter quality, differential item functioning (DIF), reliability, deviation, average item information, and relative efficiency were analyzed, and all the scales were compared.

CMB Test
The CMB test result indicated that the characteristic roots of 18 factors exceeded 1, the first (largest) of which explained merely 10.76 percent of the total variance of the data, less than 40 percent of the critical value (Zhou and Long, 2004). Therefore, no CMB exists.
A CFA model was established only for the ITS and the RPHNS, with the following results: CFI = 0.75, SRMR = 0.08, RSMEA = 0.06, and a factor correlation of 0.78.

Construct Validity Analysis
Confirmatory factor analysis was used to confirm the original factor structures of the scales, and the results are shown in Table 1. According to the CFI, SRMR, and RMSEA fit indices, the original factor structure of the RPHNS fit well, but the original factor structure of the ITS was not verified. Even after the addition of a higher-order model, the results showed little change. As a result, it was essential to conduct new analyses of the structure and dimension of the ITS. The bi-factor model, oblique factor model and higher-order model were all considered, and the higher-order model was the final choice. The reasons for the selection of the higher-order model are as follows. First, from the perspective of model fit, although both the higherorder model and the bi-factor model fit well, because of the complexity of the latter model, it must estimate more parameters and is more difficult to converge (Xu S. et al., 2017). Second, the oblique factor model cannot analyze common effects. However, because a higher-order model is superior to a lower-order model, it separates common effects from unique effects and places more emphasis on the in-depth analysis of the factor structure (Gu and Wen, 2017). Finally, because comparison of the scales was the major purpose of this research, attention must be paid to the scales from an overall perspective.
The data were randomly divided into two equal parts, one of which was subjected to EFA and the other to CFA in the higher-order model. The fit indices of the structure of the scales and the factor loading of the higher-order model are shown in Table 1 and Supplementary Appendix 1, respectively. In Supplementary Appendix 1, most of the items have relatively high factor loadings, but there are some low values, as seen in the 18th, 23rd, and 24th items on the ITS, the 4th and 17th items on the RPHNS. Although the above items had rather low factor loadings, they were not deleted because revising the scales was not the purpose of this research.  Table 2 shows the comparison between the old and new structures in detail. (1) In contrast to its former twodimensional structure, the ITS encompassed three dimensions. Based on the names of the previous dimensions and the factor loadings of the items, these three dimensions were named social phenomena, trust in others, and political trust. Table 2 indicates that political trust is the new ITS dimension.
(2) The original dimensions of the RPHNS were verified.

Fit Analysis of the IRT Model
A comparison of the fit precision of the two scales was performed within the frameworks of the GPCM, GRSM, and GRM. The indices compared were the AIC, BIC, and −2 × Log-Lik. As shown in Table 3, the GRM generally fit the two scales best and was therefore selected. The mathematical expression is as follows: where "i" refers to the item number, and "k" refers to the scoring category. Furthermore, "p * ik (θ) = 1 1+e −1.7·a i (θ−b ik ) " refers to the probability of answering by a subject whose psychological properties level is "θ" and whose score is "k" or more on item "i" (Chen et al., 2006).

Estimation and Analysis of the Item Parameters
The item parameters of the ITS, RPHNS were estimated and analyzed via the GRM model, detailed results of which are shown in Table 4. The underperforming items whose discrimination was less than 0.7 (Fliege et al., 2005) on these three scales were items 3,8,13,17,18,21,23, and 24 of the ITS and items 3, 8. Hence, with respect to item quality, the RPHNS was better than the other.

DIF Analysis
Region (urban, rural) and gender (male, female) were selected as grouping variables. The logistic regression (LR) method and McFadden's pseudo R 2 variation were used for DIF analysis. When R 2 variation was greater than 0.02, DIF was indicated to exist in the item (Choi et al., 2010). The results showed that there was no DIF between the two scales.

Test Reliability Coefficient and Deviation Curve
One of the advantages of IRT is that it can offer every subject a corresponding test reliability and deviation. The formula for calculating the reliability coefficient is r xx (θ) = 1 − 1 I(θ) , and the formula for deviation is SE(θ) = 1 √ I(θ) . "I(θ)" refers to the amount of information that a subject whose psychological properties level is θ contributes to the test (Luo et al., 2009). The reliability and deviation of the two scales were calculated under the GRM model, and the results are shown in Figures 1,  2, respectively. The ITS test reliability coefficients of subjects with different trust properties ranged from 0.76 to 0.80. Subjects whose trust properties ranged from −1 to 1 yielded test reliability values of approximately 0.8. However, the reliability values of subjects Frontiers in Psychology | www.frontiersin.org  whose trust properties were at the two extremes were relatively low. The test reliability coefficients of the RPHNS ranged from 0.82 to 0.88. Subjects whose trust properties ranged from −2 to 2 yielded test reliability values of approximately 0.87. Likewise, the test reliability coefficients of subjects whose trust properties were lower than two standard deviations of the average value were approximately 0.86. However, the test reliability coefficients of subjects whose trust properties were higher than 2 standard deviations of the average value decreased to approximately 0.84. Generally, the RPHNS exhibited comparably high test reliability and guaranteed the test reliability of subjects with relatively low trust property levels.

Average Item Information Curve
The average item information curve is the amount of information that each item offers. As shown in Figure 3, the curves for each scale were in distinct locations. At different trait levels, the curve of the RPHNS was always above the curve of the ITS, which indicated that the RPHNS had higher item quality.

DISCUSSION
As mentioned above, although the CTS was significantly positively correlated with the other two scales in terms of total scores, the correlation coefficient was somewhat low. The total scores correlation coefficient between the RPHNS and ITS was 0.57. The CFA results show that the factors were highly correlated with one another, with a correlation coefficient of 0.80 (After removing the CTS, the factor correlation between them was 0.78). By comparison, the correlation coefficients of the CTS with respect to both scores and factors ranged from 0.48 to 0.57. Both the ITS and RPHNS involve commitment to moral standards and beliefs and have similar item content. For the trusted subject reflected in the scale, the CTS focuses on familiar subjects, such as trust in companions' behavior and relationship. The RPHNS and ITS measure trust in general subjects in society or in the social environment. For example, the ITS addresses trust in the courts, officers, sales promoters, and experts, and the ITS and RPHNS evaluate most people's trust in attitudes toward social life. Therefore, such differences in focus may lead to low correlation coefficients.
According to the construct validity analysis, the original dimensions of only the RPHNS were verified. A threedimensional construct was verified after re-exploring the ITS. The factor of political trust was added to the original dimensions. The scores for the ITS suggest disparities in terms of religious belief, family background, and social class (Xu, 2010). With regard to the property of trust, the related research in the West places more emphasis on personal factors, including responsibility and ability (such as trust in general subjects in society in the ITS), while Chinese studies pay more attention to interpersonal relations, including personal relations in the process of socialization (Chang and Holt, 1991). Differences also exist between China and Western countries in the relationship between personal factors and interpersonal relations. In the West, personal factors are independent of human relations and sometimes are prior to the latter (Wang, 2008). However, in Chinese culture, when Chinese people communicate with others, they have different modes of trust because of their different social identities, statuses and relationships (Wang et al., 2016). Hence, deviation is unavoidable if the Western scales are directly applied to Chinese subjects. Furthermore, trust, interpersonal relations, and social intercourse are often intertwined. Changes in the social environment have an uncertain impact on trust. More specifically, the level or the structure of trust may be influenced (Yang and Peng, 1999).
The analytical results of IRT showed that although the same psychological properties were tested by the ITS and the RPHNS, different psychometric characteristics appeared during the application of the scales. In terms of item parameters, some items of the ITS did not work well due to their low discrimination. This finding is in accordance with previous research results: scores on items that convey negative meanings are poorly correlated with the total scores for the ITS (Jian, 2007). In addition, some items concerning politics and beliefs also showed low discrimination, including item 3 (national prospects), item 13 (international affairs), and item 18 (firm belief) of the ITS and item 4 (principles of treating others in the Bible) and item 17 (adherence to ideas) of the RPHNS. Regarding test reliability, the RPHNS showed high test reliability coefficients. Even the subjects whose trust properties were low had high test reliability.

CONCLUSION AND SUGGESTIONS
In this study, the psychometric characteristics of two common trust assessment scales were analyzed and compared using college students as the research sample. The following results were found: the ITS and the RPHNS assessed almost the same trust traits, while the CTS did not; both the ITS and the RPHNS had high reliability, with higher reliability exhibited by the RPHNS; and the original dimensions of only the RPHNS were verified, while a new dimension was added (i.e., political trust) after the re-exploration and verification of the ITS.
Based on the conclusions above, some suggestions are provided for selecting the best trust assessment scale. First, taking the properties of subjects into consideration, trust can be divided into general trust and special trust. The former refers to trust in all those who share the same beliefs, while the latter refers to trust in intimate people only, such as relatives and friends, in the process of socialization (Wessen et al., 1951). Regarding the purpose of the scale creators and the item content of the scales, the RPHNS can be used to test general trust (Jian and Tang, 2006); the ITS is the better choice to test both general and special trust. Second, the RPHNS should be used when a rigorous scale structure is needed because its structure is stable even if uncertain changes in the social environment influence the level and structure of trust. Third, the RPHNS is adequate when there is high demand for test reliability and item quality.

Limitations and Outlook
As a psychological property, trust differs by gender. Females display a higher level of trust than males (Li et al., 2007). Candidly, gender balance was not sufficiently achieved in this study. This limitation may have caused an imbalance in the subjects' trust properties in this research. In terms of sample representativeness and result generalizability, this study has certain deficiencies. However, although the study was conducted in only one area, the college students who were tested came from all parts of China. Moreover, the sampled students covered the junior college and undergraduate education levels, taking urban and rural areas and grade level into account. To a certain extent, therefore, this sample is representative of Chinese college students overall.
This study repeated the analysis of the three scales (one of which was abandoned during the process of data analysis). From the perspective of cross-cultural comparison and research rigor, there are several reasons for these shortcomings. (1) The multidimensionality of the trust scales increases the workload of data analysis and increases the difficulty of construct verification as well as comparative analysis, which is embodied in the analysis of trait congruency. In addition, the multidimensional construct is more complex in the model, and more parameters must therefore be used. To ensure the accuracy of the results, we have used a variety of analytical methods to improve the quality of this research. (2) The cross-cultural context makes construct verification more complex. This difficulty also shows that crosscultural research is necessary and that it requires more rigorous, high-quality research. (3) To sum up, to achieve balance between the research purpose and the accuracy of the results, this study focuses more on the analysis of the scales.
Three popular foreign scales were selected because they are widely used. However, according to previous research, cultural disparity exists between China and Western countries with respect to trust properties. Thus, deviation is unavoidable when Western scales are applied to Chinese subjects. Future researchers should create a new trust assessment scale that reflects the Chinese cultural context.