Development and Preliminary Validation of the Youth Leadership Potential Scale

To address the need for a valid and reliable scale of youth leadership potential based on the development theory of leadership, the current study developed the Youth Leadership Potential Scale (YLPS) and investigated its factor structure and psychometric properties in a sample of 696 students (grades 7–9) in China. Exploratory structural equation modeling (ESEM) identified a five-factor solution comprising leadership information, leadership attitude, communication skills, decision-making skills, and stress management skills. ESEM within confirmatory factor analysis demonstrated an adequate fit for this structure. The scale showed good composite reliability and measurement invariance across different gender and grade/age groups. The scale also showed sufficient concurrent validity with the Coping Self-Efficacy Scale, the Chinese Roets Rating Scale for Leadership, and the Leadership Skills Inventory. Furthermore, criterion-related validity was supported by the relationship between YLPS scores and the length of student leadership positions. The results suggest that the YLPS is a valid and pragmatic measure for assessing youth leadership potential. The current study is the first to develop a youth leadership potential scale based on the development theory of leadership.

To address the need for a valid and reliable scale of youth leadership potential based on the development theory of leadership, the current study developed the Youth Leadership Potential Scale (YLPS) and investigated its factor structure and psychometric properties in a sample of 696 students (grades 7-9) in China. Exploratory structural equation modeling (ESEM) identified a five-factor solution comprising leadership information, leadership attitude, communication skills, decision-making skills, and stress management skills. ESEM within confirmatory factor analysis demonstrated an adequate fit for this structure. The scale showed good composite reliability and measurement invariance across different gender and grade/age groups. The scale also showed sufficient concurrent validity with the Coping Self-Efficacy Scale, the Chinese Roets Rating Scale for Leadership, and the Leadership Skills Inventory. Furthermore, criterion-related validity was supported by the relationship between YLPS scores and the length of student leadership positions. The results suggest that the YLPS is a valid and pragmatic measure for assessing youth leadership potential. The current study is the first to develop a youth leadership potential scale based on the development theory of leadership.

INTRODUCTION
Leadership is like a "catalyst" that enables all other business aspects to work together. Leadership is essential to the survival and success of an organization (Antonakis and Day, 2017) because it facilitates the maximization of organizational efficiency and the achievement of organizational goals (Higgs and Aitken, 2003). As Chambers et al. (1998) suggested, the effectiveness of a leader is a major determinant of organizational performance. One limitation of the extant literature on leadership development is that it heavily focuses on leadership development and practice in adulthood (Chan, 2000), while relatively little attention has been devoted to youth (Murphy and Johnson, 2011).
Early experiences play a critical role in the development of leadership (Riggio and Mumford, 2011) by serving as the foundation for leadership development in adulthood (Karagianni and Montgomery, 2018) and increasing the possibility that an individual will grow up to become a leader. For example, research suggests that students with leadership experience in high school are more likely to become managers in adulthood (Kuhn and Weinberger, 2005). Early points in life are a sensitive period for the development of many leadership-related personality traits and skills (Riggio and Mumford, 2011) that are developed more easily and rapidly early on. Although early development of these traits and skills does not guarantee successful development in adulthood, it sets the stage for future development to occur (Murphy and Johnson, 2011). In addition, early experience in leadership facilitates the developmental process through the selfreinforcing mechanism (Murphy and Johnson, 2011). That is, the more leadership capacity individuals acquire in their early experiences, the more motived they are to engage in further leadership activities and vice versa. From this point of view, early leadership experiences are likely to be a trigger of the leadership development process.
It was not until recently that some researchers began to investigate "the seeds of leadership" (Day, 2011;Gottfried et al., 2011;Murphy and Johnson, 2011;Reichard et al., 2011). Youth leadership is inherently future oriented. An important distinction between youth and adult leadership development is that the former tends to plan for future leadership (Redmond and Dolan, 2016). That is, its focus is on the need to identify, cultivate and facilitate the development of youth so that they can become effective leaders later (MacNeil, 2006). Thus, gauging young people's leadership potential is a critical starting point that enables them to understand their strong and weak points in leadership development. As a result, they can intentionally choose different learning opportunities and activities to address their deficiencies accordingly.
Evaluations of leadership can also equip educators with clearer goals and purposes in designing and customizing leadership training programs. Although there is a dearth of research and theory on youth leadership, programs targeted at youth leadership development have been around for many years. Such programs include extracurricular activities (Hancock et al., 2012), community programs run by youth (Larson et al., 2005), and camps (Thurber et al., 2007). The key problem of these programs is that most lack support from theory on leadership development (Seemiller, 2018). As a result, these programs are generally collections of interesting leadership activities without the intentional goal of development. The evaluation of leadership potential among youth could be used to develop such programs so that adolescents can customize their curricula according to what they need most to be future leaders.
The word "potential" means "existing in possibility; capable of development into actuality; a power or quality that has not yet come forth but may emerge and develop" (Tiffan, 2009). It is assumed that potential is a dynamic and not an end state (Silzer and Church, 2010). In organizations, potential "refers to the possibility that individuals can become something more than what they currently are. It implies further growth and development to reach some desired end state" (Silzer and Church, 2009). Accordingly, leadership potential is defined as the possibility that an individual could develop and acquire those leadership competencies that are highly challenging. In general, individuals with high leadership potential possess some cognitive, emotional and behavioral growth factors that make it possible for them to grow into great leadership (Silzer and Church, 2009).
The extant literature lacks effective tools with which to evaluate youth leadership potential. Table 1 shows the commonly used youth leadership scales. As observed, these scales are mainly limited to the assessment of certain specific leadership skills. To facilitate research and practice in youth leadership development, the aim of the current study is to develop and validate the Youth Leadership Potential Scale (YLPS) based on the development theory of leadership.

The Development Theory of Leadership
The development theory of leadership proposed by Linden and Fertman (1998) believes that every adolescent possesses leadership to an extent and demonstrates their leadership abilities in subtle ways in their family life, school activities and interactions with neighbors in their communities. The theory proposes that the development of youth leadership fits into five dimensions: (1) leadership knowledge and information, (2) leadership attitude, will, and desire, (3) communication skills, (4) decision-making skills, and (5) stress management skills (Ricketts and Rudd, 2002). These five dimensions include the cognitive, emotional and behavioral aspects of youth leadership development (Linden and Fertman, 1998) and can be used as indicators of youth leadership potential.
The leadership knowledge and information dimension refers to the knowledge that adolescents must have about leadership before they can act as leaders (Ricketts and Rudd, 2002). Appropriate knowledge and information about leadership are helpful in making the abstract and complex leadership concept more concrete and operational (Ricketts and Rudd, 2002). The leadership attitude, will, and desire dimension refers to adolescents' leadership inclination, motivation, and interest (Lord and Hall, 2005). Because there are unavoidable challenges and frustrations throughout the process of leadership development, highly motivated adolescents are more likely to proactively learn from their experiences (Chan and Drasgow, 2001).
According to Linden and Fertman (1998), high-potential young people develop an array of leadership skills during their development process; these skills include communication, decision-making, and stress management. Communication skills represent the ability to present ideas to and exchange information with others. Researchers believe that communication is the basic mechanism by which leaders inspire or influence others (Boies et al., 2015). Decision-making skills represent the ability to make good choices with available information. Leaders must make a multitude of important decisions, and leaders' performance strongly depends on how effective they are in solving complex problems arising in organizations (Mumford et al., 2000). Thus, decision-making skills are crucial for the  (Day et al., 2004), and stress consumes a great deal of cognitive and emotional resources, making it difficult for leaders to function effectively (Harms et al., 2017). Thus, stress management skills are a key to effective leadership. As shown in Table 1, these three leadership skills are embedded in many previous youth leadership scales. The development theory of leadership was developed based on previous research findings (Fertman and Long, 1990;Fertman and Chubb, 1992;Wald and Pringle, 1995;and Long et al., 1996). Later researchers provided supporting evidence for the theory. For example, the theory is consistent with the experiential learning theory of Kolb (2014), which is a holistic integrative perspective on learning that combines experience, perception, cognition, and behavior. Ricketts and Rudd (2002) constructed a conceptual model after performing a meta-analysis of the literature on adolescent leadership development. Their model also proposed that youth leadership development includes five dimensions. Four of the five dimensions composing the model are consistent with the development theory. In sum, the theory provides a consistent framework for assessing, monitoring and evaluating the development of youth leadership. For example, based on this theory, Anderson (2009) examined students' perceptions of the preferences for leadership development, and Bruce and Stephens (2017) suggested a practical framework to facilitate student leadership development in various kinds of clubs and student governance.
Using the development theory of leadership as the framework, the present study developed and preliminarily validated the YLPS. Although the theory proposed by Linden and Fertman (1998) is used as the guideline for many research and practical programs (Turkay and Tirthali, 2010;Bruce and Stephens, 2017), the present study is the first to construct a youth leadership potential scale based on this theory.

Participants and Procedure
The cross-sectional study was conducted with a sample comprising 702 students from a middle school in Kunming, China. Data of six participants were removed because the proportion of missing values in their responses was higher than 10%. The remaining 696 participants formed the final dataset. Among these participants, 319 were males (45.8%), and 377 were females (54.2%). They were in seventh (20.1%), eighth (29.2%), and ninth grade (50.7%). In China, students in the same grade generally have limited variance in their age (the seventh grade is usually aged 12-13 years old, the eighth grade 13-14 years old, and the ninth grade 14-15 years old). In terms of parents' highest educational degree obtained, 51 were primary school or lower (7.3%), 143 were middle school (20.5%), 176 were high school (25.3%), 122 were college (17.5%), 169 were university (24.3%), and 25 were master or higher (3.6%). Table 2 shows the detailed demographic information of the participants.
The survey questionnaire consisted of a set of self-report measures and demographic questions. The measures were group-administered in classroom settings. Classroom teachers administered the survey with the assistance of a graduate student majoring in psychology. The aim of the study and participation guidelines were clearly stated in the title page of the survey. All questionnaire sets were completed anonymously. Students were also informed that they could quit the survey at any time during the process and the information provided would be kept strictly confidential and used solely for research purposes. Formal consent was obtained from the students before they started the survey. Each student received 10 Chinese Yuan for their participation. Debriefing information was provided at the end of the survey. The study was reviewed and approved by the Academic Ethics Committee at the first author's institution before being conducted.

Measures
The YLPS was developed to assess the leadership potential of adolescents. Before the item generation stage, semi-structured interviews were conducted to collect typical examples of the demonstration of the five dimensions of youth leadership potential. Specifically, 20 middle school teachers, 5 parents, and 16 middle school students from six different middle schools in China participated in the interviews. According to the descriptions of each dimension in Linden and Fertman (1998), the operational definitions of the five dimensions were provided to all interviewees. Specifically, the leadership information dimension refers to youths' understanding of leaders and leadership, including leaders' responsibilities, what leadership is and what leadership does. The leadership attitude dimension refers to youths' thoughts and feelings toward being a leader. The communication skills dimension refers to the ability to understand other's verbal and non-verbal message and effectively express one's own ideas. The decisionmaking skills dimension refers to the ability to cautiously make good use of available information and come to a rational choice. The stress management skills dimension refers to the ability to effectively cope with and deal with stress in daily life.
Participants were asked to provide three cases in which they observed a student who demonstrated all or part of these dimensions. For teachers and parents, cases were limited to students they taught or children of their own. For students, cases could be about themselves or their classmates. Each interview lasted approximately 1 h. The interview was recorded with permission and was transcribed by professionals afterward, resulting in a document of 534475 words in total. Teachers and parents each received 100 Chinese Yuan for their participation. Students each received a gift at the end of the interview.
By combining the theoretical consideration and the rich examples collected in the interview stage, we initially generated a pool of 39 items covering the five dimensions of youth leadership potential. For those dimensions for which classic scales are available, some items were adopted from related studies and existing scales [e.g., Leadership Skills Inventory (LSI), Karnes et al., 1985;Roets Rating Scale for Leadership (RRSL), Roets, 1997;and Leadership Ability Evaluation, Cassell and Stancik, 1982]. A total of nine items were adopted from previous measures, and thirty items were self-developed.
Subsequently, a panel of two psychologists evaluated the initial pool of 39 items. One of the psychologists was an expert in adolescence and the other was a leadership researcher. All items were evaluated based on three criteria: item relevance and item specificity to the to-be-evaluated dimensions as well as item clarity. Globally, the experts' appraisals were positive, and some items were revised according to their comments and suggestions. The revised 39 items formed the preliminary scale. All items were evaluated by participants on a 5-point Likert scale (from 1 = strongly disagree to 5 = strongly agree). Before formal data collection, a pilot test was administered to a few middle school students in seventh, eighth, and ninth grade to identify possible misunderstanding of the items. Small modifications were made accordingly.
The Coping Self-Efficacy Scale (CSES) was used to evaluate adolescents' confidence in engaging in coping behaviors when faced with life challenges (Chesney et al., 2006). The scale includes 13 items assessing three dimensions: (a) problemfocused coping (e.g., "break an upsetting problem down into smaller parts"), (b) emotion-focused coping (e.g., "take your mind off negative thoughts"), and (c) social support (e.g., "get emotional support from friends and family"). In the current study, the CSES was translated into Chinese strictly following the back-translation procedure. All items (α = 0.850) were rated on a 10-point scale ranging from 1 (not at all likely) to 10 (extremely likely).
The Roets Rating Scale for Leadership (RRSL) is a self-report scale that evaluates the leadership characteristics of students (Roets, 1997). In the current study, we adopted the Chinese revision of the RRSL (CRRSL) developed by Chan (2000). It consists of 15 items with three dimensions: leadership selfefficacy (e.g., "have self-confidence"), leadership flexibility (e.g., "can understand others' views"), and task orientation (e.g., "promote what is believed"). Participants were asked to rate the items (α = 0.900) on a 5-point scale from 1 (almost never) to 5 (almost always).
The Leadership Skill Inventory (LSI) was designed to assist students in analyzing the strength of their leadership skills (Edmunds, 1998). The Chinese version (LSI-C) developed by Wang et al. (2012) was used in the present study. The LSI-C assesses 5 leadership skills: teamwork (e.g., "I get along well with people around me"), self-understanding (e.g., "I think we should be responsible for our actions"), communication (e.g., "I am a good listener"), decision-making (e.g., "I will make decisions based on my previous experience"), and fundamentals of leadership (e.g., "I am respected by my peers"). Participants were asked to rate these 21 items (α = 0.890) using a 5-point scale ranging from 1 (strongly disagree) to 5 (strongly agree).
The length of student leadership positions was collected to estimate the criterion-related validity of the YLPS. It was measured by the question "How many years in total since primary school have you been a student leader?" Demographic variables included gender, grade, and family SES. Participants' gender was dummy coded, with females coded as 0 and males coded as 1. Grade was coded into three categories, with seventh grade coded as 1, eighth grade coded as 2, and ninth grade coded as 3. Family SES was calculated based on parents' highest educational degree obtained, parents' occupation, and family property (Yuan et al., 2009;Oecd, 2010).

Data Analysis
First, we conducted missing values analysis and sample size adequacy estimation in the preliminary analyses to ensure that the data was suitable for further analysis. Then, we examined three aspects of the psychometric properties of the YLPS. (1) We conducted exploratory factor analysis (EFA) and exploratory structural equation modeling (ESEM) to examine the factorial validity of the scale. (2) We examined the composite reliability of subscales using McDonald's (1970) ω. (3) We examined the concurrent and criterion-related validity of the scale. The preliminary analyses and EFA were conducted using SPSS 22.0 (IBM SPSS Inc., Chicago, IL, United States). Other data analyses were conducted using Mplus 7.11 (Muthén and Muthén, 2019). In particular, the ESEM code generator (De Beer and Van Zyl, 2019) was used to generate the Mplus code for ESEM.

Factorial Validity of the YLPS
In order to examine the factorial validity of the YLPS, the total sample was randomly split into two independent subsamples, subsample A (n = 345) and subsample B (n = 351), for the purpose of cross-validation.
First, EFA with principal axis factoring extraction and direct oblimin rotation was conducted in subsample A to explore the potential possible number of factors to be extracted.
Second, based on the parallel analysis and scree plots in EFA as well as the theoretical framework, multiple models were constructed and further estimated using ESEM in subsample A. ESEM can be considered as an "integration of the best features of exploratory and confirmatory factor analysis (CFA) within the structural equation modeling framework" (Marsh et al., 2014). One of the main advantages of ESEM is that it allows free crossloadings of items on multiple factors and therefore provides a less simplistic and more flexible, naturally existing, and valid factor structure (Asparouhov and Muthén, 2009;Marsh et al., 2014). To determine the most appropriate factor structure, the goodnessof-fit values of all candidate models were compared. Several fit indices were evaluated to determine the model fit. Since chisquare tests (χ 2 ) are very sensitive to sample size and to minor deviations from multivariate normality, researchers typically focus on sample size-independent indices to assess model fit (Marsh et al., 2005), particularly the comparative fit index (CFI), the Tucker-Lewis index (TLI), the root mean square error of approximation (RMSEA), standardized root mean squared residual (SRMR). CFIs and TLIs greater than 0.900 are considered to present adequate model fit, and values greater than 0.950 are preferable (Hu and Bentler, 1999). RMSEAs smaller than 0.060 and SRMRs smaller than 0.080 support a good model fit (Hu and Bentler, 1999). Moreover, there are some information theoretic indices, such as the Akaike information criterion (AIC), Bayesian information criterion (BIC) and sample-size-adjusted BIC (aBIC), which can be used to compare competing models and make a trade-off between model fit and model complexity. A lower AIC/BIC value indicates a better trade-off between fit and complexity (van de Schoot et al., 2012). In addition, the standardized factor loadings (λ) of each item should also be taken into consideration. Following the recommendations of Osborne et al. (2008), each factor in a model should comprise at least three items with λ greater than 0.320 on their target factor.
Third, after the optimal number of factors and the corresponding ESEM model were obtained, the standardized factor loadings of each item were checked. Items with low loadings (λ < 0.320) on their target factors or substantial cross-loadings (λ > 0.300) were eliminated from the initial 39item model.
Fourth, the revised model obtained at this stage was crossvalidated by the ESEM-within-CFA (when ESEM is estimated with target rotation, it becomes possible to specify a priori hypotheses regarding the expected factor structure and thus use ESEM for purely confirmatory purposes; Browne, 2001;Asparouhov and Muthén, 2009) in subsample B, and the goodness-of-fit of the model was checked again. Then, the factor names were assigned based on the revised model. Through the above steps, the final first-order ESEM model was obtained.
However, it is possible that a first-order ESEM model may ignore the presence of hierarchically superior constructs, which will end up being expressed through inflated cross-loadings (Morin et al., 2016). Thus, hierarchical ESEM (H ESEM) and bifactor ESEM (B ESEM) were also conducted according to the suggestion of Morin et al. (2016). In the H ESEM model, all first-order factors were specified as related to a single higher-order factor, with residual correlations among the firstorder factors (for a detailed introduction of the H ESEM model, please see Morin et al., 2016). The B ESEM model was estimated in line with typical bifactor assumptions using orthogonal bifactor target rotation. This model partitions the total covariance among the items into a global (G) component underlying all items and specific (S) components explaining the residual covariance not explained by the G factor (for a detailed introduction of B ESEM model, please see Morin et al., 2016). Therefore, in order to choose the best model, we compared the five-factor (first-order) ESEM-within-CFA, H ESEM, B ESEM, and CFA solutions of the YLPS using the total sample to make full use of the data. The analyses in the following steps were all based on the total sample. The same aforementioned criteria were used to determine the most appropriate factor structure. The best model obtained was used for subsequent analyses.
Finally, to test the measurement invariance of the best model, the model was first estimated separately in all gender-and age-related subsamples and then across gender and age groups according to the following steps: (a) configural invariance; (b) weak invariances (loadings); (c) strong invariance (loadings, intercepts); (d) latent means invariance (Meredith, 1993). In each invariance sequence, the preceding model served as a referent.
To determine the measurement invariance, the goodness-offit values of these models were compared by examining the changes in fit indices in comparison with the preceding more parsimonious model. Following the recommendations of Chen (2007) and Cheung and Rensvold (2002), it is reasonable to select the more complex model over the more parsimonious model when CFI and TLI are greater than 0.010 or decreases in RMSEA are greater than 0.015. The results of the above steps provided empirical support of the factorial validity of the YLPS.

Reliability and Validity
Subscale composite reliability was computed using McDonald's (1970) where λ i are the loadings and δ ii are the uniquenesses. According to Bagozzi and Yi (1988), composite reliabilities greater than 0.60 are desirable. The latent variable correlations in the final model were evaluated to test the relative independence of the factors.
The common method bias test was carried out before the concurrent validity test. It was evaluated by the model with all factors of the YLPS, CSES, RRSL, and LSI collapsed into one latent factor (Podsakoff et al., 2003). Common method variance was not considered a serious problem if the fit indices indicated a worse fit for the one-factor model.
Then, the latent factor correlations of the YLPS, CSES (problem-focused coping, emotion-focused coping, and social support), RRSL (leadership self-efficacy, leadership flexibility, and task orientation), and LSI (teamwork, self-understanding, communication, decision-making, and fundamentals of leadership) were used to evaluate the concurrent validity of the YLPS. The measurement models for the CSES, RRSL and LSI were specified as CFA factors in accordance with the results of previous validation studies of the constructs (for CSES see Chesney et al., 2006; for RRSL see Chan, 2000; for LSI see Wang et al., 2012), and the measurement model for the YLPS was specified according to the best fitting model obtained in the previous steps. Finally, the criterion-related validity and incremental validity of the YLPS were tested by regression analyses (Boateng et al., 2018).

Preliminary Analyses
First, according to Schafer (1999), a missing rate of 5% or less of variables is inconsequential. In the current study, the missing value analysis showed that the missing rates of all items were below 1.5%, and only three items' missing value rates were above 1%. Second, the high participant-to-item ratio (18:1; Anderson et al., 1992) and good Kaiser-Meyer-Olkin values (KMO = 0.904, p < 0.001; Cerny and Kaiser, 1977) suggested that sample size was adequate and the data were suitable for factor analysis.

Factorial Validity of the YLPS
The theoretical framework suggested a five-factor structure, as described in the literature review section. However, based on the results of EFA, the parallel analysis suggested a fourfactor structure, while the scree plot indicated a strong elbow after the fifth factor. Combining these theoretical and empirical considerations, three models with four (Model 1), five (Model 2), and six factors (Model 3), respectively were estimated using ESEM in subsample A. The goodness-of-fit statistics of these three models are displayed in Table 3.
The results showed unsatisfactory goodness-of-fit indices (TLI < 0.900) for Model 1. Adequate goodness-of-fit indices (CFI and TLI > 0.900; RMSEA < 0.050; SRMR < 0.080) were observed for Models 2 and 3. The indices of Model 3 seemed superior to those of Model 2. However, further investigation suggested that there was a factor in Model 3 comprising less than 3 items with λ greater than 0.32 on their target factor, suggesting that Model 3 might be relatively unstable. Thus, the more parsimonious five-factor model was retained for subsequent analyses.
Based on the standardized factor loadings of the 39-item fivefactor model, six items were deleted due to low loading on their target factors, nine items were deleted due to substantial crossloadings, and 1 item was deleted for both reasons. The resulting 23-item five-factor model of the YLPS was re-estimated by ESEM and provided even better goodness-of-fit indices (see Model 4 in Table 3). The standardized factor loadings of the items in Model 4 are listed in Table 4. The 23-item YLPS showed significant and satisfactory target factor loadings (|0.843 to 0.406|, M|λ| = 0.631; SD| λ| = 0.128) with small cross-loadings. Then, the 23-item five-factor model was cross-validated by ESEM-within-CFA in subsample B and provided satisfactory goodness-of-fit indices (see Model 5 in Table 3). The standardized factor loadings of Model 5 are presented in Table 4.
The goodness-of-fit indexes of the first-order ESEM-within-CFA model with five factors, H ESEM, B ESEM and CFA are presented in Table 3 (Model 6-9). The fitness is better for Models 6 and 8 than for Models 7 and 9.
To further compare Models 6 and 8, the factor loadings of these two models are reported in Table 5. The target loadings of the G factor in Model 8 are not high, and many are lower than 0.32 (|0.020 to 0.571|, M|λ| = 0.331, SD|λ| = 0.179) and the target loadings of the S factor (leadership information: |0.604 to 0.798|, M|λ| = 0.690, SD|λ| = 0.084; leadership attitude: |0.329 to

Measurement Invariance
The measurement invariance of the final five-factor (first-order) ESEM-within-CFA model was tested. Two ESEM-within-CFA models were conducted for female subsample (CFI = 0.956, TLI = 0.925, RMSEA = 0.047; SRMR = 0.028), and male subsample (CFI = 0.964, TLI = 0.939, RMSEA = 0.038; SRMR = 0.029), separately (Model 10 in Table 6). In different gender subsamples, complete invariance (loading, intercepts, and latent means; Models 11-14 in Table 6 Table 6). In different grade/age subsamples, the configural model (Model 16 in Table 6) achieved a satisfactory fitness of the data (CFI = 0.953, TLI = 0.919, RMSEA = 0.047, SRMR = 0.032), supporting the configural invariance across grade/age subsamples. A partial weak invariance (Model 17 in Table 6) was supported after releasing equality constraints of loadings of item 26 (i.e., "I am good at recognizing non-verbal signs from others."). In addition, partial strong invariance (Model 18 in Table 6) and partial latent means invariance (Model 19 in Table 6) were achieved across grade/age subsamples after the invariance constraints were released for item 26, and Model 19 has the lowest BIC value. These results indicated that students differing in grade/age might have different understandings of item 26. Generally speaking, the first-order model with five factors demonstrated good measurement invariance across different groups.

Composite Reliability and Inter-Factor Correlations
All factors presented acceptable to modest composite reliability coefficients (from ω = 0.651 for the decision-making skills dimension to ω = 0.847 for the leadership information dimension; see Table 7). In addition, the latent variable correlations were small to moderate, confirming the relative independence of the dimensions.

Common Method Bias
The fit indices indicated a worse fit for the one-factor model (CFI = 0.564, TLI = 0.551, RMSEA = 0.072), which suggested that common method variance was not a serious problem.

Concurrent Validity
The latent variables correlations are reported in Table 8. The five dimensions of the YLPS had significant positive correlations with almost all dimensions of the three concurrent scales, i.e., CSES, RRSL and LSI. For example, the decision-making dimension of LSI was highly correlated with the decision-making skill dimension of the YLPS as expected (r = 0.646, p < 0.001), and the leadership self-efficacy dimension of RRSL was highly correlated with the leadership attitude dimension of the YLPS as expected  (r = 0.666, p < 0.001). These results provide evidence for the good concurrent validity of the YLPS.

Criterion-Related Validity and Incremental Validity
The length of time participants served as class leaders was treated as the dependent variable, and the latent factors of the YLPS were treated as independent variables. The results of the regression   Table 9. According to Model c, leadership information, leadership attitude, and communication skills could significantly predict the criterion. The comparison between Models a and b suggested that after controlling for the effects of gender, age and family SES, the five latent factors could still explain 6.2% of the variation. These results provided evidence of the criterion-related validity of the YLPS. The comparison between Models d and e suggested that the five latent factors could explain 7.7% of the variation over and above the CSES, RRSL and LSI, demonstrating the incremental validity of the YLPS. The results of the regression analysis also suggested that the length of leadership position was negatively predicted by gender but positively predicted by grade/age and family SES.

DISCUSSION
Emphasizing the importance of leadership development at an early age, the development theory of leadership has attracted attention from both researchers and practitioners for many years (Ricketts and Rudd, 2002;Anderson, 2009). This theory has a strong theoretical basis (Linden and Fertman, 1998) and has gained support in recent research (Bruce and Stephens, 2017). However, no attempt has been made to develop a youth leadership scale based on the theory. Existing youth leadership evaluation scales mainly focus on the skills that adolescents have acquired and demonstrated. The development theory proposes that in addition to leadership skills, two other aspects should be considered when assessing young people's leadership potential.
One is leadership knowledge and information, and the other is the attitude toward being a leader. Young people who accumulate the necessary knowledge and information about leadership, develop a positive attitude toward leadership, and demonstrate key leadership skills are more likely to grow up to become leaders. That is, they are adolescents with high leadership potential. To facilitate the research and practice of youth leadership development, the current study constructed a youth leadership potential scale based on the development theory of leadership.
The results of the current study suggest that the psychometric properties of the YLPS are satisfactory. First, the results of the ESEM in subsample A and ESEM-within-CFA in subsample B cross-validated the first-order five-dimensional structure of the scale. The five dimensions include leadership information, leadership attitude and three leadership skills, i.e., communication skills, decision-making skills and stress management skills. The rationality of the first-order fivedimensional ESEM model was further confirmed when compared with the H ESEM, B ESEM, and CFA models. Results suggested that the first-order ESEM-within-CFA model fitted the data better   than the others, which is in accordance with the development theory of leadership. According to the theory, the five dimensions of the YLPS are relatively distinctive and representing different aspects of youth leadership potential (Linden and Fertman, 1998). The first-order ESEM-within-CFA model specifies five unique factors and allows for cross-loadings. However, the H ESEM model specifies all first-order factors as related to a single higherorder factor and the B ESEM model uses the G factor and S factor together to represent the total covariance among all items, both of which specify a global level of youth leadership potential and influence the expression of dimensional uniqueness to some extent (Fadda et al., 2017). That is why the less restrictive ESEM within CFA models is the best one. In addition, the measurement invariance test suggested that the first-order five-dimensional structure of the scale generally performed well in different gender groups and grade/age groups.
Second, the acceptable-to-modest composite reliability of each dimension and the small-to-moderate latent variable correlations among the dimensions in the final model suggested that the YLPS is reliable and the five dimensions are relatively independent. Third, the five dimensions of the YLPS were significantly positively correlated with almost all dimensions of three other commonly used scales assessing youth leadership, suggesting its sufficient concurrent validity. Finally, regression analysis suggested that the YLPS could predict the variation in the length of time students serve as class leaders over and above both major demographic variables (gender, age, and family SES) and existing youth leadership scales (i.e., CSES, RRSL, and LSI), thus providing evidence of the YLPS's criterion-related validity and incremental validity. In sum, the current study concluded that the 23-item five-dimensional YLPS is a promising tool for assessing the leadership potential of adolescents.
The development of the YLPS has both theoretical and practical implications. With regard to the theoretical aspect, constructing a scale based on the development theory of leadership should spur future empirical studies to explore the mystery of leadership from a developmental perspective. Extant research on the development theory of leadership is mainly theoretical and conceptual (Ricketts and Rudd, 2002), largely because there is no scale based on this theory. The timely development of the YLPS will fill this gap. We call for more empirical studies to test and enrich the development theory of leadership. According to the development theory of leadership, the development of youth leadership can be divided into three stages: awareness, interaction, and mastery (Linden and Fertman, 1998). Before the awareness stage, adolescents generally are unaware that they are leaders, and leadership seems to be distant from them. With the accumulation of information on a variety of leaders and ways to lead, they begin the process of recognizing and identifying their leadership potential. After the awareness stage, adolescents begin to see themselves as leaders and actively strengthen and broaden their leadership potential and skills through interactions with others and their social environment. They are now in the interaction stage, in which they receive feedback through their interactions. As a result, their confidence as leaders increases, and their leadership skills improve considerably. With the skills and experience they have acquired in the interaction stage, adolescents enter the mastery stage, where they gradually become mature leaders and influence others. Future longitudinal studies could utilize the YLPS to testify the three-stage conceptual model and trace the development of youth leadership along different dimensions in each stage.
In terms of practical implications, because the YLPS simultaneously measures the leadership knowledge and information dimension, the leadership attitude, will, and desire dimension and the leadership skills dimension, it provides a comprehensive measurement framework. In leadership training practice, the YLPS can be used in screening programs for youth leadership potential. Based on the screening results on different dimensions of the YLPS, educators and trainers can provide more effective programs targeting those aspects of leadership potential that need more support. Adolescents' leadership potential can be cultivated more efficiently when a leadership development curriculum is developed based on individual differences in the different dimensions of leadership potential. For example, if Jerry scored high on most leadership skills but low on leadership attitude while Tom scored the opposite, it would be inefficient for them to go through the same leadership developing program. The YLPS makes it possible to design a training program and select training content according to individual needs. However, it is important to note the potential negative aspect of youth leadership potential gauging. Low scores might hurt young people and block their path to growing into a great leader. Thus, using the YLPS in practice should focus on nurturing youth leadership development in different aspects rather than dividing youth into different levels.
This study found interesting results on the relationship between demographic variables and YLPS scores. The results suggested that the length of leadership positions since primary school was longer for girls than for boys. Similarly, related studies have found that girls are more likely to be nominated as leaders (Waasdorp et al., 2013). However, this finding is opposite to our observations that males rather than females dominate leadership positions in adulthood (Bear et al., 2017). It would be interesting for future studies to address several questions: What happens to girls during the process of growing up that hinders them from benefiting from their leadership experience that they develop at an early age? What is the turning point? Is this phenomenon mainly due to generally held stereotypes toward female leaders, or is it possibly due to insufficient development of some dimensions of leadership potential? Another possibility is that it is simply due to cultural perceptions (Albirini, 2006). That is, even if women have equal or sometimes even higher leadership potential, they are less likely to be selected as leaders because of common beliefs and values in society. In addition, the results also showed that family SES is positively associated with the length of leadership positions. This finding is consistent with previous research. For example, Li et al. (2011) found that individuals from high-SES families are more likely to assume leadership positions in adulthood. It would be beneficial to explore how family SES influences the development of different dimensions of youth leadership potential and the underlying mechanism of those effects. These are all important questions that could be answered in future research.
The limitations of the current study call for researchers to collect further evidence of the reliability and validity of the YLPS. First, given that the current study is cross-sectional, the test-retest reliability of the scale cannot be established. Second, all measures in the current study are self-rated. Although data analysis suggests that common method bias is not a serious problem, assessment accuracy is a concern that cannot be ignored. That is, cognitive or motivational bias might impede youth in accurately assessing themselves. Actually, some youth leadership assessment tools are teacher-rated. However, researchers have suggested that teachers' evaluation of students may be influenced by students' past performance, which is an indicator of halo bias creeping into the assessment process, as high-performance scores tend to be generalized to other characteristics, often incorrectly so (Dries and Pepermans, 2012). Future studies could use multisource data to corroborate each other. Third, the participants in the current study came from one middle school. As shown in Table 2, gender was roughly balanced, and the parents' highest educational degree obtained was diverse. In a supplementary analysis, we tested the normality of family SES using the skewness and kurtosis of the distribution. The results suggested that the distribution of family SES (skewness = −1.050, kurtosis = 1.029) was not substantially differ from normality (West et al., 1995). These results provide some evidence of the representativeness of our sample. However, future research based on a more diversified sample would be helpful to further verify the generalizability of the results. Fourth, in the current study, we chose the length of time participants served as class leaders as the criterion to test the criterionrelated validity of the YLPS. Besides the length of leadership, leadership effectiveness could also be evaluated in future study as a criterion to support the validity of the scale. Given that youth leadership potential is inherently future-oriented, research using a longitudinal design would be helpful to provide further evidence of the predictive validity of the YLPS.
The development theory of leadership is a proactive theory because it emphasizes that leadership can be learned, cultivated and nurtured. Adding leadership knowledge and attitude to the youth leadership potential model can allow researchers and practitioners to focus more on the acquirability of leadership.
Such a developmental leadership perspective might improve adolescents' leadership self-efficacy, which may have many positive developmental effects (Murphy and Johnson, 2011). Future research would be helpful to explore this possibility. In addition, in the current study, we strictly followed the dimensions suggested by Linden and Fertman (1998); future research is needed to explore whether there are other fundamental leadership skills that should be added to update the model.

DATA AVAILABILITY STATEMENT
The datasets generated for this study are available on request to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Committee on Human Protection and Ethics in Psychology at Beijing Normal University. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.

AUTHOR CONTRIBUTIONS
YY and XS developed the study concept, created the study design, and drafted the manuscript. QC performed testing and data collection. YY and ZL performed the data analysis and interpretation. GX and DY provided revisions. All authors approved the final version of the manuscript for submission.