Evaluation of Psychometric Properties of Hardiness Scales: A Systematic Review

Background Hardiness is one of the personality traits that can help individuals in stressful situations. Since human beings are constantly under stressful situations and the stresses inflicted on people in each situation are different, various scales have been developed for assessing this feature among different people in different situations. Hence, it becomes necessary for researchers and health workers to assess this concept with valid and reliable scales. This systematic review aims to rigorously assess the methodological quality and psychometric properties of hardiness scales. Method In the first step, the databases including Scopus, PubMed, Web of science, and Persian databases were searched using suitable keywords without limitation time. We select eligible suitable studies after screening titles and abstracts. The quality of studies was evaluated using the Consensus-based Standards for the selection of health Measurement Instruments (COSMIN) checklist and the Terwee quality criteria. Result Of the 747 articles identified, 33 articles were entered in this study. Based on the COSMIN checklist, the most reported properties were as following structural validity (84%), hypothesis testing (56%), content validity (42%), and internal consistency (39%). Furthermore, 12 studies reported cross-cultural validity, three studies criterion validity, and one study reported measurement error. Conclusion The “family caregivers’ hardiness scale,” “Japanese Athletic Hardiness Scale,” “Occupational Hardiness Questionnaire,” and “Children’s Hardiness Scale” are the best tools for assessing hardiness in family caregivers, athletes, employees, and children respectively. In addition, the “Dispositional Resilience Scale” (DRS-15) and The Personal Views Survey (PVS III-R) are the most frequently used scales with suitable features for measuring hardiness in the general population.


INTRODUCTION
Human beings are constantly growing and moving from one stage to the other. This personal development process is an unpredictable and demanding process during each of the development stages during stressful circumstances (Maddi, 2004;Sharif Nia et al., 2021). These stress conditions can have a negative effect on performance, motivation, and health if they are not handled well (Bonanno, 2004). It should be noted that in addition to the natural and continuous stresses during the growth process, the current circumstances create conditions that add additional stresses by rapid changes in all spheres of life (Efimova et al., 2019). Many people cannot control these stressful situations. This in turn can threaten the individual's physical, mental, and social aspects of their health (Bigalke, 2015).
Hardiness is one personality trait that can help individuals in stressful situations. The concept of hardiness was first proposed by Kobasa in 1979 based on the existence theory, which is conceptualized as one of the main personality structures for understanding motivation, excitement, and behavior (Kobasa, 1979). This concept finds meaning in the face of stressful situations are considered as a buffered and intervening variable that moderates the relationship between stressful situations and the physical and psychological effects (Abdollahi et al., 2018). Hardiness is a combination of attitudes and beliefs that motivate an individual to do hard and strategic work in the face of stressful and difficult situations (Maddi, 2007). Kobasa defined hardiness as a multidimensional personality trait consisting of three components or the 3C's: commitment, control, and challenge (Kobasa, 1979). Commitment was defined as a tendency to engage in life's activities and to have a genuine interest and curiosity about the world around us (activities, things, and others) and it includes a feeling of personal competence and feeling of community and/or corporation, control was defined as believing and acting as if one can influence the events of one's life, and this belief in influence occurs as part of one's efforts. This feature allows the person to perceive the predictable consequences of their activities in stressful events and manage them favorably (Luceño-Moreno et al., 2020). Finally, the tendency to challenge was defined as the belief that change, rather than stability, as a natural way of life creates opportunities for personal growth rather than a threat to one's security (Kobasa, 1979).
It should be noted that in 2005, Maddi proposed another dimension called connection as the fourth dimension or the 4th C of hardiness (Maddi and Khoshaba, 2005). According to him, individuals gain part of their power and ability to face stressful situations because of communication with other members of society. Therefore, communication is one of the factors that play an important role in creating and maintaining hardiness (Maddi and Khoshaba, 2005). In 2017 Mund proposed culture as the fifth dimension or the 5th C influencing hardiness. In other words, she proposed that hardiness should not be interpreted as a simple approach regardless of culture (Mund, 2017).
Hardiness is a trait that is related to the person and his environment. Because the prevailing social and cultural conditions affect a person's perception and experience of hardship and threat. In addition, his/her understanding of protective factors and how to use them, and through this, the hardiness dimensions and meanings can be formed (Chan, 2000;Benishek et al., 2005;Green et al., 2020). Therefore, by examining this concept in different groups of people with different stressful situations, various definitions, and components of it have been proposed according to the target community and the context and situation of stressful situations (Hosseini et al., 2021). For example, occupational hardiness means endurance and ability in difficult situations and in fact refers to a person's performance based on cognitive assessments (Moreno-Jiménez et al., 2014). Wagnild and Young also conducted studies on the concept of hardiness in older women and concluded that the meaning of this concept in this group of people includes: equanimity, selfefficacy, perseverance, meaningfulness, and existential aloneness (Wagnild and Young, 1988). Likewise, because hardiness can be taught to people, in order to improve this feature and the ability of people to deal with stressful situations and reduce the effects of stress. Different scales have been developed for different groups such as college students, children, nursing students, and managers (Bartone, 1991;Benishek et al., 2005;Moreno-Jiménez et al., 2014). It should be noted that knowing the degree of the hardiness of individuals or evaluating the effectiveness of interventions requires an accurate and valid scale with desirable psychometric properties (Hosseini et al., 2021). Importantly, these scales consist of different dimensions, and some scales do not cover all the dimensions of hardiness. Hence, this systematic review aims to evaluate the psychometric properties of these scales and make recommendations about their use.

Study Design
This is a systematic review to evaluate the psychometric properties of the hardiness scales that were conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (Moher et al., 2009).

Eligibility Criteria
Eligibility criteria of this study included English and Persian articles describing the psychometric properties of scales/the process of validation/cross-cultural evaluation of the concept of hardiness. Excluded were articles with irrelevant topics, review/systematic review articles, structural equation model or model testing articles, and articles in languages other than Persian and English.

Information Sources
Five electronic databases such as Scopus, PubMed, Science Direct, ProQuest, and Web of Science were searched for English articles. Two Persian databases including Persian SID 1 and MAGIRAN 2 were also searched for Persian articles. Finally, Google Scholar as a search engine and ProQuest database were searched to identify relevant theses. It is noteworthy that the reference lists of all identified articles were also searched manually. The search took place from the years 1979-2022.

Search Strategy Electronic
The search strategy was based on the principle that considering a wide range of search terms leads to the best results of related studies. Therefore, in this study, the search strategy was designed taking into account the main concept, which is hardiness, and the type of study, which includes development or psychometric studies and using considering "abstract, title and keywords." These keywords were used: hardiness, hardy personality, personality hardiness validity, validation, reliability, development, and psychometric. The Persian meanings of these keywords were used for searching in Persian databases. It is noteworthy that each database was searched with proper syntaxes (see Table 1).

Study Selection
The initial search yielded 747 articles, 77 were from Scopus, 246 were from PubMed, 111 were from Web of Science, 55 were from Science Direct, 169 were from Google Scholar, 47 were from ProQuest, and 42 were from Persian databases. Of the 747 articles initially identified, 33 met all the inclusion criteria. See reasons for exclusion in Figure 1.
All of the articles found by searching databases were stored in an EndNote (version X8; Thomson Reuters, New York, NY, United States) file to display duplicate results. Two authors (LH and HN) independently evaluated all articles for inclusion and exclusion. Any discrepancy between the authors was resolved through joint discussions with the third author. See the selection process schematically in Figure 1.

Data Extraction
The data were extracted by two researchers (LH and HN) where one was an expert in statistics extracted data and another was an expert in the concept of the study. A data extraction sheet included: first author name, publication year, country, name of scale, target population, face validity, content validity, construct validity (sample size, factor extraction method, rotation methods, selection of the number of factors, name of factors, and total variance), and reliability [consistency: Cronbach's alpha coefficient, stability: Spearman's correlation coefficient, and Intraclass Correlation Coefficient (ICC)] (see Supplementary Table 1).

Risk of Bias
The Consensus-based Standards for the selection of health Measurement Instruments (COSMIN) Risk of Bias checklist was used to assess this feature for each of the 33 studies. This tool includes 3 parts with 10 boxes. The first part addresses content validity and includes boxes 1 and 2. This part assesses the relevance and comprehensibility of all items with the target construct and population. Second Part with boxes 3, 4, and 5 addresses internal structure with structural validity, internal consistency, and cross-cultural validity/measurement invariance. The third part with boxes 6, 7, 8, 9, and 10 address the remaining measurement properties including reliability, measurement error, criterion validity, hypotheses testing for construct validity, and responsiveness. The third part focuses on the quality of the (sub)scale as a whole, rather than on item level (Mokkink et al., 2018).

Quality Assessment and Data Analysis
The full text of the articles was evaluated in terms of methodological quality based on the checklist provided by COSMIN. The COSMIN checklist assesses different psychometric properties including: A = internal consistency, B = reliability, C = measurement error, D = content validity, E = structural validity, F = hypothesis testing, G = cross-cultural validity, H = criterion validity and I = responsiveness. Finally, each article was analyzed using a four-point COSMIN score. Each item was classified into four levels including "excellent" as an appropriate methodology, "good" as an adequate level of quality and insufficient relevant information, "fair" as the questionable methodological process, and "poor" as an incorrect methodological process. A methodological quality score per box is obtained by taking the lowest rating of any item in a box ("worst score counts") (Terwee et al., 2012). Finally, Terwee's study criteria were used to analyze the quality criteria of the measured properties (Terwee et al., 2007). The Inter-reviewer reliability was evaluated according to the Cohen's Kappa value. Any discrepancies were resolved through discussion and consensus.

Data Synthesis
Since a general analysis of psychometric properties is not possible, the characteristics of the available articles were used to determine the validity of the instrument.

Study Characteristics
A total of 747 articles were found; of these 42 articles were from the Persian database and 705 articles were from English language databases. Duplicate articles were excluded and 33 articles were reminded and were evaluated using the COSMIN checklist and Terwee study criteria (see PRISMA flow chart, Figure 1).

Databases
Search string

Findings From the Risk of Bias Evaluation
Using the COSMIN Risk of bias checklist, the quality of the research manuscripts included in this review was evaluated. From 33 articles, only 45.4% of the studies (15 articles) scored "very good" on both content validity boxes. Also, only 39.3% of the studies (13 articles) scored "very good" on both internal structure boxes. The third part of the risk of bias assessment includes 4 boxes that only 4 studies reported on 3 of 4 boxes as not very good; just one study got a "very good" score in 2 boxes (Hosseini et al., 2022). Details of the risk of bias have been reported in Table 2.

Psychometric Properties
Concerning the study design, 15 studies were conducted to develop a scale and 18 of them assessed the psychometric properties. See details of psychometric characteristics in Supplementary Table 1. These scales were different based on item number and dimensions. The minimum item number was 12 (Kardum et al., 2012;Dymecka et al., 2020;Yamaguchi et al., 2020) and the maximum was 45 (Lang et al., 2003). Also, the minimum numbers of dimensions were one in two studies (McNeil et al., 1986;Kardum et al., 2012) and one instrument had 9 dimensions (Kamtsios and Karagiannopoulou, 2013). From these 33 studies, 31 studies tested internal consistency, 16 tested test-retest reliability, two studies tested criterion validity Dymecka et al., 2020), and 30 studies tested construct validity. Most of the studies evaluated internal consistency and stability using Cronbach's alpha, but four studies evaluated stability using ICC (Picardi et al., 2012;Kamtsios and Karagiannopoulou, 2013;Solano et al., 2016;Hosseini et al., 2022). The criterion validity was tested in two studies Dymecka et al., 2020). The construct validity was tested using principal components factor or principal axis factor analysis in most of the studies (n = 16), exploratory factor analysis (n = 3), and confirmatory factor analysis (CFA) was assessed in 10 studies. Five studies did not evaluate the construct validity. The total variance that is explained with these scales ranges from 32.1% to 69% and 15 studies did not report it.

Quality Assessment
The details of the COSMIN quality assessment of 33 articles are shown in Tables 3, 4. None of these articles had "Excellent" quality in all psychometric properties.

BOX A-Internal Consistency
The interrelatedness among the items of each scale was determined by measuring internal consistency. The main quality criteria to evaluate internal consistency are as follows: (1) adequate sample size (seven per items and > 100), (2) calculating Cronbach's alpha (s) for each dimension separately, and (3) Cronbach's alpha (s) between 0.70 and 0.95 (Terwee et al., 2007). Based on these criteria 13 studies were evaluated as "Excellent, " one study was "good" because it did not calculate alpha for each dimension/subscale separately (Bartone, 1991). Three studies did not evaluate internal consistency (Funk and Houston, 1987;Velasco-Whetsell and Pollock, 1999;Creed et al., 2013) and were deemed of "poor" quality. Two studies were evaluated as "poor" because did not meet two of the three criteria (McNeil et al., 1986;Lang et al., 2003). Finally, 14 studies were evaluated as "fair" because their Cronbach's alpha (s) were < 0.70 or > 0.95.

BOX B-Reliability
Reliability was used to show that score did not change by repeating the measurement with three methods: (1) test-retest for overtime, (2) inter-rater for measuring by different persons on the same occasion, and (3) intra-rater for measuring by the same persons (i.e., raters or responders) on different occasions. The main quality criteria to evaluate reliability are ICC or weighted Kappa ≥ 0.70 (Terwee et al., 2007). Five studies were evaluated as "Excellent, " (Picardi et al., 2012;Kamtsios and Karagiannopoulou, 2013;Solano et al., 2016;Ko et al., 2018;Hosseini et al., 2022), eight studies were evaluated as "poor" because they did not report ICC or Kappa value; and 20 studies did not evaluated reliability and were deemed of "poor" quality.

BOX C-Measurement Error
The means of measurement error is the systematic and random error of a score that cannot be attributed to true changes in the construct reported by the Standard Error of Measurement (SEM). Just one study reported measurement errors (Hosseini et al., 2022).

BOX D-Content Validity
Content validity is defined as "the content of the scale items reflects the structure we intend to measure." The quality criteria to evaluate the content validity are assessment of the relevancy of all items to the construct, the study population, the measurement purpose, and experts involved in item selection. 15 studies did not report content validity and they were evaluated as "poor." Four studies did not mention who was involved in content validity and they were evaluated as "good" (Velasco-Whetsell and Pollock, 1999;Picardi et al., 2012;Creed et al., 2013; and 14 of others were evaluated as "Excellent."

BOX E-Structural Validity
Structural validity refers to the degree to which the scores obtained from the scale reflect sufficient dimensions of the construct. Main quality criteria that show this feature are performing factor analysis by FEA or CFA. In this review, five studies did not report factor analysis and were evaluated as "fair" (Bartone, 1991;Velasco-Whetsell and Pollock, 1999;Wang, 1999;Picardi et al., 2012;. Other studies were evaluated as "Excellent."

BOX G-Cross-Cultural Validity
According to the COSMIN checklist, cross-cultural research refers to the ability to translate items to reflect the original version of the scale items. The main criteria for assessing these features are as follows: (1) describing the translation process, (2) translating items forward and backward, (3) independently, (4) adequate sample size, (5) pre-testing the scale, and (6) performing Confirmatory Factor Analysis (CFA). Three studies had mentioned that they translated the scale but they did not report the details and were considered "poor" (Gebhardt et al., 2001;Lang et al., 2003;Moreno-Jiménez et al., 2014). Seven studies were evaluated as "good" (Velasco-Whetsell and Pollock, 1999;Wang, 1999;Picardi et al., 2012;Persson et al., 2016;Solano et al., 2016;Ko et al., 2018;Dymecka et al., 2020) because they did not perform CFA or pre-testing. Two studies reported cross-cultural processes with details and they were evaluated as "excellent" Mohsenabadi and Fathi-Ashtiani, 2021).

BOX H-Criterion Validity
Criterion validity indicates the degree to which the scores of the scale are an adequate reflection of a "gold standard". The main quality criteria are using the gold standard (having convincing arguments) and the current scale correlates > 0.70 with this gold standard. Three studies had reported the criterion validity as follows: (1)  also reported the criterion validity by assessing correlation among health-related hardiness scale (HRHS), Sense of coherence, Self-efficacy, Acceptance of illness, and Psychological resilience (Dymecka et al., 2020). Since the scales that they had chosen were not the gold standard and the correlation between scales was not > 0.70, these studies were evaluated as "fair." It is noteworthy that the responsiveness categories were not analyzed, because there were no results related to that.

DISCUSSION
This study has evaluated the psychometric properties of 33 scales about hardiness using the COSMIN checklist. The salient findings from this study include that no studies have an "Excellent" score for all of the quality criteria of psychometric properties. Therefore, there is no robust and valid single scale for measuring the hardiness concept yet. This systemic review evaluated all the studies related to psychometric properties about hardiness conducted in different fields, different target populations, different publication times, and countries. Since present life is associated with multiple fast-paced changes and stressful circumstances, individuals in every stage of life, field, and situations need to be able to develop hardiness to face life's difficulties. The results show that the development of scales for hardiness was conducted for any age group from children to older adults. Also, different situations were considered such as students (Benishek and Lopez, 2001;Benishek et al., 2005;Creed et al., 2013;Kamtsios and Karagiannopoulou, 2013;Cheng et al., 2019;Soheili et al., 2021a), athletes (Yamaguchi et al., 2020), patients Duffy, 1990), general population (McNeil et al., 1986;Funk and Houston, 1987;Bartone, 1991;Maddi et al., 2006;Hystad et al., 2010), parents (Lang et al., 2003;Soheili et al., 2021b), employees (Moreno-Jiménez et al., 2014), and family caregivers (Hosseini et al., 2022). Therefore, some studies were specific for a group of people with a specific situation and some of them were general. As results show, seven scales were developed for students; it may be because students are likely to experience stress and struggle and have had less opportunity to develop hardiness . It should be noted that the Dispositional Resilience Scale (DRS-15) and The Personal Views Survey (PVS), PVS II, PVS III, and PVS III-R are the most frequently used scales and they were translated and assessed in several languages (Hystad et al., 2010;Wong et al., 2014;Madrigal et al., 2016;Solano et al., 2016;Ko et al., 2018;Mohsenabadi and Fathi-Ashtiani, 2021). The newest scale was the "family caregivers' hardiness scale" for family caregivers of patients with Alzheimer's disease (Hosseini et al., 2022).
The dimensions of all scales could be categorized into three themes as designated by Kobasa such as commitment, control, and challenge. Dimension of commitment refers to the tendency toward involvement in the situation as opposed to isolation and explains variances that ranged from 8.92  to 38.91% (Kamtsios and Karagiannopoulou, 2013) in these studies. The Control dimension refers to belief in the effectiveness of effort on results even in stressful situations. This dimension explains the largest proportion of total explained variance of hardiness in some studies (Pollock and Duffy, 1990;Solano et al., 2016;Yamaguchi et al., 2020). The final dimension is the challenge that refers to perceiving life challenges as a normal part of life and trying to turn them into learning opportunities. This dimension also explains the largest proportion of total explained variance of hardiness in some studies (Hystad et al., 2010;Moreno-Jiménez et al., 2014;Madrigal et al., 2016). The most dimension related to Kamtsios et al. with nine factors of which six factors related to commitment, two factors related to challenging and one factor related to the control dimension (Kamtsios and Karagiannopoulou, 2013).
Since factor extraction uses for raising the explained variance with classifying items into a minimum number of factors, most studies explained total variance ≤ 50%; so that the maximum total explained variance is 68.9% related to one study with two factors (Funk and Houston, 1987), and Soheili et al. with 65.75% total variance with three factors (Soheili et al., 2021a). Also, the minimum variance explained according to the study by Pollok (32.1%) reported two factors that measured the effect of hardiness in an individual with an actual health problem (Pollock and Duffy, 1990).
Because the COSMIN checklist is the only standard tool for evaluating the quality of development and psychometric studies. It should be noted that this tool does not report the overall quality scores, because the psychometric properties items are not equal (Terwee et al., 2007). It should be noted that some studies did not report the essential information about psychometric properties clearly and they got a score "poor." Therefore, a low-quality assessment of a scale does not indicate that this scale is inappropriate. In terms of quality, it should also be noted that the quality of more recent articles was better than older publications. This may be due to the development of guidelines by journals for writing and new statistical methods for psychometric evaluation of scales. Another noteworthy point is that most of the studies failed to report face validity, stability, measurement error, and an evaluation of responsiveness, but the newest scale designed in 2022 for family caregivers of patients with Alzheimer's disease has all of these features.
In sum, despite the development of tool guidelines for writing and new statistical methods for psychometric evaluation of scales, each scale has at least one "Poor" psychometric property. Therefore, it is recommended that the COSMIN checklist is used for developing and accessing psychometric properties of scales to provide high-quality scales and future studies should consider features recommended by the COSMIN checklist such as face validity, stability, measurement error, and responsiveness when evaluating the psychometric properties of scales.
Finally based on the results of this systematic review, the highest methodological quality among translation and psychometric studies was the "Korean version of the 15-item Dispositional Resilience Scale" by the Ko et al. study with four boxes of COSMIN checklist scored as "Excellent, " two boxes "Good, " and one box "Fair" . Also, the highest methodological quality among development studies was the "family caregivers' hardiness scale" in Hosseini et al. study (Hosseini et al., 2022) with five important boxes of the COSMIN checklist scored as "Excellent, " after that the "Occupational Hardiness Questionnaire" in Moreno-Jiménez et al. study (Moreno-Jiménez et al., 2014), "Japanese Athletic Hardiness Scale" in Yamaguchi et al. study (Yamaguchi et al., 2020), and "Children's Hardiness Scale" in Soheili et al. study (Soheili et al., 2021b) with four boxes of COSMIN checklist scored as "Excellent."

Study Limitations
One of the important limitations was lack of access to the full text of the four articles (McCubbin, 1987;Godoy-Izquierdo and Godoy, 2003;Wiedebusch et al., 2007;Grau-Valdes et al., 2020) and lack of assessing two related studies. Because they were in language other than English or Persian Serrato, 2017).

Study Strength
Hardiness is an important psychological characteristic to deal effectively with stressful situations and reduces the negative physical and psychological effects. Since hardiness can be taught to individuals, knowing which scale has strong validity and reliability characteristics is essential to properly measure this concept. This is the first study that evaluated all scales designed since the introduction of this. Therefore, the findings of this study can help researchers choose the best scale to measure this concept accurately.

Implication
The results of this study can help nurses, researchers, psychologists, health workers, and other decisionmakers to identify the best scale concerning quality and psychometric properties.

CONCLUSION
This systematic review provides information about the quality of 33 studies that assessed the psychometric properties of hardiness in various individuals in different stressful situations using the COSMIN checklist. Based on the study results, among developed scales, the "family caregivers' hardiness scale, " "Japanese Athletic Hardiness Scale, " the "Occupational Hardiness Questionnaire, " and "Children's Hardiness Scale" are the best for assessing hardiness in family caregivers, athletes, employees and children. In addition, the Dispositional Resilience Scale (DRS-15) and The Personal Views Survey (PVS III-R) are the most frequently used scales with suitable features for measuring hardiness in the general population.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author/s.

AUTHOR CONTRIBUTIONS
LH and HSN designed the study protocol. LH, MA, and HSN searched the data bases and selected the suitable studies. LH and EF wrote the manuscript. All authors approved the final format of manuscript for publication.

ACKNOWLEDGMENTS
We would like to thank all the participants who took part in the study.