Development and Validation of a Pioneer Scale on Service Leadership Behavior in the Service Economies

In response to the severe lack of leadership assessment tools in the Chinese context, the Service Leadership Behavior Scale was developed based on the Service Leadership Model proposed by Po Chung, the co-founder of DHL International. Utilizing responses from 4,486 Hong Kong undergraduates, this paper reports the findings of a validation study on the Short-Form Service Leadership Behavior Scale (SLB-SF-65). Previous findings based on exploratory factor analysis supported a six-factor 48-item solution (SLB-SF-48). With the removal of ten items, confirmatory factor analysis showed that the final 38-item scale (SLB-SF-38) possessed excellent internal consistency, concurrent validity, and factorial validity based on multigroup invariance analyses. Overall speaking, the present study underscores the utility of the SLB-SF-38 as an objective assessment instrument of service leadership behavior in the education, research and personnel training contexts.


INTRODUCTION
Over the past few decades, a structural transformation from the manufacturing-based to servicefocused economies has been observed in many developed as well as developing countries (Bryson and Daniels, 2015;Snell et al., 2017). As such, possessing effective leadership qualities in this service era is indispensable in the contemporary world (Chung, 2015;Chung and Elfassy, 2016).
This service-focused leadership has been widely discussed in literature on both public and commercial service units. According to Schneider et al. (2005), leader's service-focused behavior, or service leadership, communicates a commitment to high levels of service quality. Compared with general leadership, service leadership is believed to exert a stronger influence on service outcomes (Hong et al., 2013). It is argued that service-oriented management and effective service leadership foster a service climate and consequently improve service performance (Jiang et al., 2015). Some assessment tools on service leadership have been developed and adopted in related empirical studies (Schneider et al., 2005;Jiang et al., 2015), such as Service Climate Scale (includes items measuring service-oriented leadership behavior) developed by Schneider et al. (1998), and a managerial measure of organizational service-orientation developed by Lytle et al. (1998), where service leadership was conceptualized as a combination of servant leadership and service orientation.
Although available scales measuring service leadership have a solid theoretical foundation and engendered much research, some research gaps exist. First, these scales were often developed with a strong focus on customer service. However, "service" in service economy should be interpreted in a broader context involving not only customer service but also the commitment to self-development, service to followers as well as society. Second, although service leadership is closely related to servant leadership, they are distinct concepts (Sendjaya and Sarros, 2002;Wong et al., 2015).
According to the servant leadership theory, followers' needs precede leaders' individual needs . In contrast, service leadership seeks the mutual satisfaction of needs of both leaders and followers. Therefore, servant leadership scales may not be totally appropriate to assess service leadership. Third, available scales of service leadership mainly focus on leadership competences that guide and reward service delivery (i.e., "doing" of service leadership), such as goal setting, planning and coordinating (Schneider et al., 2005). Leaders' ability to make moral decisions and caring for others (i.e., "being" of service leadership) have often been considered relevant factors but not indispensable attributes of service leadership (Jiang et al., 2016). To fill the gaps, a set of assessment tools measuring service leadership was developed based on the Service Leadership Model proposed by Po Chung (Shek et al., , 2018a. In the following parts, the Service Leadership Model, its unique features, and the project entailing the construction and validation of Service Leadership Scales are outlined.

The Service Leadership Model and Its Unique Features
Service leadership is conceptualized as a "service aimed at ethically satisfying the need of self, others, groups, communities, systems, and environments" (Shek and Lin, 2015a, p. 233). The Service Leadership Model highlights three core attributes: Competence, Character, and Caring. First, Competence covers one's task-specific knowledge and skill sets required to excel in operational duties, which are essential for leaders to win over their followers (Chung and Bell, 2015). Character is defined as one's propensity to behave "in ways that are consistent with high [moral] values" (Chung and Elfassy, 2016, p. 59), to command respect and trust from followers. Care entails harboring an unselfish intent toward others so as facilitating their growth and development (Greenleaf, 1977;Shek and Li, 2015).
The Service Leadership Model builds on and complements other existing leadership paradigms such as servant leadership, ethical leadership, and transformational leadership (see  for a thorough review). First, as discussed earlier, contrary to the servant leadership model deemphasizing one's own needs (Greenleaf, 1970;Russell and Stone, 2002), effective service leadership appreciates self-serving endeavors to develop one's capacity and eagerness to satisfy others' needs. Second, while the ethical leadership model emphasizes moral Character (Brown and Treviño, 2006), Competence  and service provision on the "self " and "others" levels (Mendonca, 2001), how Care impacts leadership effectiveness remains under-addressed . Third, transformational leaders motivate the pursuit of collective goals at the expense of personal interest, and in so doing these leaders help followers fulfill their potential through idealized influence, inspirational motivation, intellectual stimulation, and individualized considerations (Bass, 1990;Avolio et al., 1999). Transformational leadership theory has limited coverage on Competence and Care as the determinants of leadership success .
In a nutshell, the Service Leadership Model incorporates several core features of related leadership paradigms and attempts to build up an integrative perspective in leadership . Such a perspective inspires the education of a generation of new leaders that can thrive in this service era (Shek and Chung, 2015;.

Service Leadership Education in Hong Kong
As one of the most important outcomes of higher education, leadership of university students is highly regarded by both universities and employers (Bacon et al., 1979). However, a discrepancy exists between employers' expectation and what university students could demonstrate in service economies . Such a discrepancy results in a mismatch in recruitment, low job satisfaction and even mental burnout amongst the existing staff (Towers Watson, 2012). Thus, Po Chung, the co-founder of DHL International and the incumbent chairperson of the Hong Kong Institute of Service Leadership & Management Limited (HKI-SLAM), put forth the Service Leadership Model with a vision to nurture a generation of emergent service leaders who are not only competent, but are also moral and caring .
To promote quality leadership education conducive to students' personal growth and employability, Chung argued passionately for the need to incorporate formal training based on the Service Leadership Model into the curriculum of undergraduates in Hong Kong (Chung, 2015;. With the financial support of the Victor and William Fung Foundation and the collaborative effort from the HKI-SLAM and universities financed by the University Grants Committee (UGC), a multi-year project entitled "Fung Service Leadership Education Initiative (FSLEI)" was implemented in eight UGC-funded universities in Hong Kong. Based on the Service Leadership and Management (SLAM) curriculum framework proposed by the Hong Kong Institute of Service Leadership and Management Limited [HKI-SLAM] (2013), all institutions under the FSLEI independently developed programs and curriculum materials that facilitate learning of service leadership at the undergraduate level (Shek and Chung, 2015). While it is important to develop service leadership curriculum materials and training programs, it is equally important to develop objective measures of service leadership qualities (Shek and Chung, 2015). Unfortunately, the paucity of validated assessment tools on service leadership in the Chinese context  has hindered meaningful analyses on the effectiveness of service leadership education under the FSLEI Lin, 2015b, 2017).
Against such a backdrop, the research team at a Hong Kong university initiated a multi-year project entitled 'Development and validation of measures based on the Service Leadership Model' . This project entailed the construction and validation of three scales, each of which constituted a parameter of success of an educational program  pertaining to one's Attitude, Behavior, and Knowledge on the Service Leadership Model . Some related publications can be seen elsewhere (e.g., Shek et al., 2018b,c,f;Shek and Chai, 2019). This paper primarily discusses the findings of a large-scale validation study on the Service Leadership Behavior Scale, which was designed to measure one's exhibited behavioral attributes characteristic of a service leader.

Service Leadership Behavior Scale
As part of the research program , the Long-Form Service Leadership Behavior Scale (SLB-LF-97) was developed primarily based on the SLAM curriculum framework (Hong Kong Institute of Service Leadership and Management Limited [HKI-SLAM], 2013), 25 Principles of Service Leadership (Chung and Bell, 2015), 12 dimensions of a Service Leader (Chung and Elfassy, 2016), and other published works from the leadership literature (e.g., Wielkiewicz, 2000;Ho and Nesbit, 2009). Initially, the SLB-LF-97 contained the following proposed domains: 3-Cs model (Competence, Character and Care), service provision, commitment to continuous improvement, and distributed leadership.
The SLB-LF-97 was administered in a preliminary validation study involving 231 university students (Shek et al., 2018b), where the results informed the retention of 65 items forming a short-form of the scale (SLB-SF-65). The SLB-SF-65 included 12 factors: problem-solving, self-leadership and life-long learning, non-cognitive intrapersonal competences, distributed leadership, integrity, care provision, concern, self-reflection, service provision, positive social relationship, communication skills, and fairness (Shek et al., 2018b). Both the SLB-LF-97 and the SLB-SF-65 exhibited excellent reliability (αs > 0.95) and robust convergent validity, with the latter evidenced by the significant and positive correlation with a host of theoretically relevant constructs such as servant leadership (r = 0.78) and leadership self-efficacy (r = 0.55) (Shek et al., 2018c). Nonetheless, the dimensionality of the SLB-SF-65 remained to be ascertained owing to the relatively modest sample size (N = 231). The background, conceptual model and steps involved in the development of different forms of Service Leadership Behavior Scales are outlined in Shek et al. (2018e).

Objectives of the Present Study
Utilizing the data from a validation study involving 4,486 undergraduates from eight UGC-funded universities, the present study sought to build upon the abovementioned preliminary validation study (Shek et al., 2018c) in two ways. First, following the commonly adopted two-step dimensionality analysis (Park, 2014;Besnoy et al., 2016) involving an exploratory factor analysis (EFA) followed by a confirmatory factor analysis (CFA), the present study attempted to examine the dimensionality of the SLB-SF-65. Second, via the utilization of a much larger sample alongside several well-validated external criterion measures adopted in the study of Shek et al. (2018c), the present study attempted to further establish the reliability and convergent validity of the SLB-SF-65. Based on Shek et al.'s (2018c) initial findings, this study constituted a pioneer effort to construct and validate an objective assessment tool on service leadership in a Chinese context. The present findings contribute to the scanty literature of service leadership evaluation in the Chinese context Lin, 2015b, 2017) and serve to produce a valuable instrument to assess learning outcomes of service leadership training programs (Shek and Chung, 2015).
In the present study, evaluation of factorial validity of the SLB-SF-65 involved two steps, with the dataset (N = 4,486) randomly split into two halves (subsets A and B) to facilitate both the EFA and the CFA. The EFA performed on subset A (N = 2,246) resulted in a stable and valid initial six-factor, 48item solution (SLB-SF-48, see Figure 1), which was consistent with the original conceptual model. Details pertaining to the EFA were reported in Shek et al. (2018c). The six factors, each of which formed a subscale on the basic dimensions of service leadership, were accordingly named (a) Self-improvement and Self-reflection (12 items), (b) People and Principles Orientation (12 items), (c) Resilience (8 items), (d) Social Competence (7 items), (e) Problem-Solving (6 items), and (f) Mentorship (3 items). In this paper, this six-factor solution was then subjected to a CFA performed on subset B (N = 2,240), with the objective to evaluate how this proposed model fit the rest of the data and stability of the factor structure.

MATERIALS AND METHODS
The data were derived from a research project on service leadership involving eight UGC-funded universities in Hong Kong. Students were invited to participate in the survey via an electronic platform. The data were collected between March and June, 2017. During the survey, the purpose of this study, the principles of voluntary participation and withdrawal, and the compensation arrangement were explained on the survey webpage and the invitation documents. Students were asked to indicate their acceptance or refusal to join the study on the opening page. We rewarded each participant a supermarket gift voucher valued at HK$100 (US$12.80).

Procedures
In total, 4,555 completed responses were retrieved. Three steps were performed for data cleaning. First, we removed six cases in which students declined to participate. Second, 30 cases were excluded because either they had completed the questionnaire designed for universities other than their own, or they revealed themselves as non-undergraduates in openended questions. Third, after reviewing respondents' student identity number (which is anonymous to the Research Team), 33 cases with multiple participation were removed from the sample. Ultimately, 4,486 cases were retained as the working sample.

Profiles of the Respondents
Among the 4,486 students, 1,517 were males and 2,969 were females. The majority of the sample were aged 20-24 years (68.4%; mean age = 20.47 years, SD = 1.67), had previous work experience (91.4%), and assumed the leadership position before (61.4%). Most participants had not received credit-or non-creditbearing training in service leadership before (74.3 and 82.0%, respectively), and claimed to know "a little" or "some" about service leadership (75.0%).

Assessment of Service Leadership Qualities
The Long-Form Service Leadership Behavior Scale (SLB-LF-97) was designed to measure the behavioral attributes of an effective service leader . The 97 scale items were developed based on the general leadership literature (e.g., Wielkiewicz, 2000;Ho and Nesbit, 2009), publications based on the Service Leadership Model (e.g., Chung and Bell, 2015;Chung and Elfassy, 2016)   All sample items were slightly re-phrased to avoid practice effect.
service provision, commitment to continuous improvement, and distributed leadership. The SLB-LF-97 was validated in a study involving 231 students from a university in Hong Kong (Shek et al., 2018b). The findings suggested the retention of 65 items to form the SLB-SF-65, which was employed in the present study. The dimensions derived are generally consistent with the original conceptual model. Each item of the SLB-SF-65 describes a specific leadership behavior where the respondents evaluate how well each item describes their leadership behavior (see Table 1 for sample items). A six-point Likert scale was used (1 = very dissimilar; 6 = very similar). Both the SLB-LF-97 and the SLB-SF-65 recorded excellent internal consistency (αs > 0.95; mean inter-item correlations > 0.25) in the previous validation study (Shek et al., 2018c). The research also entailed the construction of scales designed to assess individuals' knowledge of the Service Leadership Model (Shek et al., 2017, p. 167) as well as their attitudes and beliefs about desired leadership qualities (Shek et al., 2017, p. 212). In the present study, the shortened final versions of these two scales were administered.

Short-Form Service Leadership Knowledge Scale (SLK-SF-40)
The Service Leadership Knowledge Scale was developed based on the SLAM curriculum framework (Hong Kong Institute of Service Leadership and Management Limited [HKI-SLAM], 2013) and the literature on service leadership (e.g., Chung and Elfassy, 2016). Participants' responses to the original 200 items were coded based on accuracies (1 = correct; 0 = incorrect). Based on a criterion-validation study involving 160 Hong Kong university students , 50 items were retained to form the shortened scale (SLK-SF-50). Then the SLK-SF-50 was administered in a large-scale validation study, of which the results suggested the removal of additional 10 items to form the final SLK-SF-40 (Shek et al., 2018d). Table 2 illustrates several sample items of the final SLK-SF-40 administered in the present validation study.

Short-Form Service Leadership Attitude Scale (SLA-SF-46)
The Long-Form Service Leadership Attitude Scale was developed based on the Service Leadership Model  and the leadership literature (e.g., Page and Wong, 2000;Kopelman et al., 2008). Each of the original 132 statements presents a viewpoint on the nature of leadership and how a leader ought to conduct him/herself, where participants evaluated the extent to which they concurred with each item . A six-point Likert scale was used (1 = strongly disagree; 6 = strongly agree). Based on findings from an unpublished, quasi-experimental validation study involving 200 students from a university in Hong Kong, a shortened version of the survey containing 73 items was formed (SLA-SF-73). The SLA-SF-73 was further refined based on Exploratory Factor Analyses and Confirmatory factor analyses by using a large-scale sample (Ma et al., 2018;Shek and Chai, 2019). The final SLA-SF-46 used in the present study possesses excellent internal consistency (α = 0.93, mean inter-item correlations = 0.27). Sample items of the SLA-SF-46 are shown in Table 3.
The present study is primarily concerned with the validation findings for the SLB-SF-65. Details in relation to the validation of the SLA-SF-73 and the SLK-SF-50 are discussed in two separate papers (Shek et al., 2018d,f).

External Criterion Measures
Four external criterion scales adopted from the personality and leadership literature were used to gauge the convergent validity of the SLB-SF-65. These included the Revised Servant Leadership Profile (RSLP), Moral Self-Concept Scale (MSC), Leadership Efficacy Scale (LEF), and the Interpersonal Reactivity Index (IRI).  All sample items were slightly re-phrased to avoid practice effect.
The RSLP was developed by Wong and Page (2003) to examine servant leadership. In this study, we selected five factors of the RSLP, which included 20 items that were highly relevant to the SLAM curriculum framework (Hong Kong Institute of Service Leadership and Management Limited [HKI-SLAM], 2013). These five factors are empowering and developing others (five items), serving others (seven items), open, participatory leadership (two items), inspiring leadership (two items), and courageous leadership (four items). The RSLP demonstrated excellent reliability in the present study (α = 0.94, mean inter-item correlations = 0.46).
The MSC was developed by Cheng (2005) to measure young people's self-appraisal on morality. The dimensions of MSC include conduct and virtues, self-control and disciplines, and altruism. All these aspects are crucial to how a service leader conducts himself/herself (Chung and Bell, 2015). The MSC presented good internal consistency in this study (α = 0.83, mean inter-item correlations = 0.44).
The LEF was developed by Murphy (1992) to examine one's level of confidence in his/her capacity to lead effectively. The LEF showed an acceptable internal consistency metrics (α = 0.70, mean inter-item correlations = 0.24).
The IRI was developed to assess empathy (Davis, 1983). In this study, we selected 14 items from two subscales of IRI, including empathic concern (IRI-EC, seven items) and perspective taking (IRI-PT, seven items). These two subscales are closely related to the qualities of an effective service leader (Chung and Elfassy, 2016). The IRI also showed good internal consistency in the present study (α = 0.74).

Factorial Validity
Both exploratory (EFA) and confirmatory factor analysis (CFA) were involved in the validation study. While EFA provides preliminary evidence of a theoretical factorial solution (Shek et al., 2018c), CFA serves to verify the solution and validate the construct of the instrument (Besnoy et al., 2016). This twostep analytic approach has been commonly adopted to establish factorial validity of an instrument (e.g., Park, 2014;Wu and Mohi, 2015;Swami et al., 2017). SPSS version 24.0 (IBM) was utilized to administer the EFA and analyses of reliability and convergent validity. Mplus version 6.12 Muthén, 1998-2010) was used to perform the CFA.
As mentioned above, EFA was conducted on the SLB-SF-65 using a principal component analysis (PCA) with varimax rotation. Related findings suggested a six-factor structure of the trimmed scale (i.e., SLB-SF-48), which retained 48 items with factor loadings larger than 0.50. Besides, identical PCAs were performed on subsets A (N = 2,246) and B (N = 2,240). Tucker's coefficients of congruence (r c ) were used to evaluate the factor structure stability across the two subsets. SLB-SF-48 was revealed to be internally consistent and have a stable factorial structure. The item loadings of all 48 items ranged from 0.50 to 0.76. Details regarding the EFA and the steps involved in forming the initial 48-item behavior scale were reported in another paper (Shek et al., 2018c). The present paper primarily reports the findings of the CFA performed on the subset B (N = 2,240), internal consistency, convergent and factorial validity of the final version of the Service Leadership Behavior Scale (SLB-SF-38).
Before performing the main analyses, we conducted a preliminary screening to examine the skewness and kurtosis of the variables involved. Chou and Bentler's (1995) criteria was adopted (skewness < |2|; kurtosis < |7|). Then we administered the multigroup CFA (MGCFA) to establish measurement invariance of the final model. A series of MGCFAs were conducted following the steps suggested by van de Schoot et al. (2012), which specified configural, metric, scalar and error variance invariance models to be examined. The MGCFAs were performed on three pairs of subsamples under subset B (N = 2,240). One pair involved males (N = 728) versus females (N = 1,498), the second pair included "odd" (N = 1,120) versus "even" (N = 1,120) groups based on case number, and the third pair included "young" (N = 1,120) versus "old" (N = 1,120) groups based on student age. Due to length constraints and the similarity of the analyses between gender and age groups, the present study mainly reported the detailed information of measurement invariance tests on the first two pairs of subsamples.

Reliability and Convergent Validity
Cronbach's alpha values and mean inter-item correlations were used as the indicators of reliability of the behavior scale and the subscales derived. We also examined the convergent validity of the behavior scale in terms of its correlation with relevant constructs such as servant leadership and empathy measured by external measures (e.g., RSLP, IRI). Specifically, considering that servant leadership, moral self-concept, leadership efficacy and empathy were all key behavioral prerequisites of a service leader (see Chung and Bell, 2015;Chung and Elfassy, 2016), we hypothosized a positive and significant correlation between the service leadership behavior scale and the RSLP (Hypothesis 1), MSC (Hypothesis 2), LEF (Hypothesis 3), and IRI (Hypothesis 4), respectively.
The convergent validity of the behavior scale could be further evidenced by its correlation with the SLA-SF-46 and the SLK-SF-40. Since all three scales were constructed to examine different facets of service leadership, we predicted a positive and significant correlation between the behavior scale (and its subscales) with both the SLA-SF-46 (Hypothesis 5) and the SLK-SF-40 (Hypothesis 6).

Data Screening and Descriptive Statistics
As detailed in Table 4, Cronbach's alpha values and mean inter-item correlations showed good internal consistency of the initial six-factor solution (see Figure 1). No abnormal findings were found regarding each variable's means, standard deviation, univariate skewness and kurtosis values. In short, the descriptive analyses informed the normality of data distribution, rendering the use of Maximum Likelihood (ML) estimation method appropriate. The sample size of the present study (N = 2,240) was also adequately powered (MacCallum et al., 1999).

Factorial Validity Assessment
Factor Structure of the Initial Model: SLB-SF-48 Based on the original EFA solution, the findings revealed that the initial model (SLB-SF-48) fit the data reasonably well (RMSEA = 0.061; SRMR = 0.046), although some indices (CFI = 0.86; NNFI = 0.86) fell short of the recommended levels (Aquino and Reed, 2002). After reviewing the modification indices (M.I.s), we further removed 10 items reflecting double factor loadings or a strong residual covariance with other items or factors (see Table 5) (Anderson and Gerbing, 1988;Awang, 2012). The alpha values remained high when an item was removed from the scale (ranged from 0.853 to 0.925, see Table 5). The resultant six-factor, 38-item model (Model 1) was subjected to the second CFA.
Factor Structure of the Modified Model: SLB-SF-38 As detailed in Table 6 Byrne (1998) contended that these extreme M.Is. may be attributed to the unique characteristics that these items shared in content. Accordingly, these three pairs of scale items were revisited. First, both items Q04 and Q05 refer to problem-solving. Second, items Q18 and Q19 measure specifically participants' adaptive coping strategies amidst adversity. Third, both items Q49 and Q50 tap into participants' mindset or competence in goal-setting. In a nutshell, all these observations pointed toward an overlap in content amongst the three pairs of items, which justified the inclusion of error correlations amongst these pairs (Shek and Yu, 2014). Consequently, three modified models were re-specified based on Model 1. More specifically, Model 2 included a correlation between errors of items Q04 and Q05; Model 3 built on Model 2 by incorporating an error covariance of items Q18 and Q19; Model 4 further added to Model 3 by covarying the errors of items Q49 and Q50.  Cheung and Rensvold's (2002) proposed cutoff of | 0.01| as the benchmark. The results showed that Model 4 significantly improved than Model 1. As a result, Model 4 was accepted as the final model (SLB-SF-38, see Figure 2).
As shown in Table 7, the standardized factor loadings of all 38 items were above 0.50 (p < 0.001, two-tailed), and squared multiple correlations were greater than 0.25 (p < 0.001, two-tailed).

Invariance Tests Across Genders
Model 4 was tested separately by gender in Model 5 and Model 6 to gauge its factorial stability (Byrne, 1998;Shek and Ma, 2010). As shown in Table 6 Table 8, all factor loadings and the squared multiple correlations in the two models were significant at p < 0.001, two-tailed.
In Model 11, equality constraints were placed upon both factor loadings and measurement intercepts across the male and female groups. The value of CFI (0.004) denoted invariance in measurement intercepts of each item across genders (see Table 9).
Lastly, in Model 12 we constrained the error variance, factor loading, and measurement intercept of each variable to be equal across genders to establish error variance invariance model (Model 12). The value of CFI (0.009, see Table 9) was again below 0.01, suggesting that same level of measurement error was present for each item between males and females (Milfont and Fischer, 2010, p. 115).

Invariance Tests Across Other Subsamples
Following Shek and colleagues' procedure Ma, 2010, 2014;Shek and Yu, 2014), subset B (N = 2,240) was further divided into group "odd" (N = 1,120) and group "even" (N = 1,120) based on case number. Both groups were subjected to the identical set of invariance tests as reported above. As shown in Table 6 In Model 13, no equality constraints were imposed. As illustrated in Table 9, the goodness-of-fit indices of Model 13 exhibited acceptable fit to the data (χ 2 (1,294) = 5,520.98; CFI = 0.912; NNFI = 0.904, RMSEA = 0.054 [90% CI: 0.053 to 0.055]; SRMR = 0.048), suggesting configural invariance. We further constrained the factor loadings to be equal in Model 14 and compared it with the baseline Model 13. The result of χ 2 test was significant at the 0.05 level ( χ 2 = 46.66, df = 32, p < 0.05). The resultant value of CFI (<0.001) provided support for the metric invariance across the two subsamples. In Model 15, equality constraints were further placed on the measurement intercepts of all items. The χ 2 test showed a nonsignificant result ( χ 2 = 40.56, df = 38, p > 0.05). Likewise, the value of CFI (<0.001) derived from the comparison between Model 14 and Model 15 conveyed scalar invariance. In Model 16 the error variance, factor loading and measurement intercept were held equal for every item across both subsamples. Although the χ 2 test showed a significant difference between Model 16 and 15 ( χ 2 = 76.56, df = 41, p < 0.001), the resultant value of CFI (0.002) remained trivial by Cheung and Rensvold's (2002) standard, signaling error variance invariance of the final factorial solution (SLB-SF-38) as displayed in Figure 2.
Besides, we also examined the measurement invariance across age groups by dividing subset B (N = 2,240) into two groups based on student age. The "Young" Group (N = 1,120, mean age = 19.17 years, SD = 0.76) and "Old" Group (N = 1,120, mean age = 21.71, SD = 1.24) were subjected to the same invariance tests mentioned above. Same as gender invariance, the resultant values of CFI (≤0.01) also supported configural, metric, scarlar and error variance invariance of the factorial structure between the two age groups. In summary, the present findings provided strong support for the factorial validity of the 38-item Service Leadership Behavior Scale (SLB-SF-38). Apart from exhibiting adequate fit to the data, the strong factorial stability of the SLB-SF-38 was underscored by the series of invariance test performed based on groups defined by gender and age as well as with randomly assigned subjects. Specifically, measurement invariance of the SLB-SF-38 was supported in terms of configural, metric, scalar, and error variance invariance. Table 10, the SLB-SF-38 showed excellent reliability (α = 0.96, mean inter-item correlations = 0.38). All its six subscales also demonstrated good to excellent reliability in the present study (αs > 0.84, mean inter-item correlations > 0.35). The inter-correlations among the SLB-SF-38 and the subscales ranged from 0.42 to 0.87 (p < 0.001, two-tailed). These findings underscored the strong internal consistency of the SLB-SF-38 and the subscales.

Correlation With External Criterion Measures
As shown in Table 11, consistent with Hypotheses 1 to 4, correlational findings revealed the significant (p < 0.001, two-tailed) and positive association between the SLB-SF-38 (inclusive of all subscales) and the RSLP (rs ranging from 0.49 to 0.79), MSC (rs ranging from 0.37 to 0.66), LEF (rs ranging from 0.37 to 0.52) and IRI (rs ranging from 0.20 to 0.55). These findings provided convergent evidence for the validity of the SLB-SF-38, given that this scale was moderately related to several constructs outlining the behavioral characteristics of a service leader (Chung and Elfassy, 2016).

Correlation With Other Service Leadership Measures
Furthermore, findings of correlational analyses between the SLB-SF-38 and the final versions of the Service Leadership Attitude (SLA-SF-46) and Knowledge (SLK-SF-40) Scales are summarized in Table 12. Discussions in relation to the validation of the eightfactor SLA-SF-46 as well as the one-factor SLK-SF-40 are featured in two other papers. The SLB-SF-38 was overall moderately and positively linked to the SLA-SF-46 (r = 0.58) and also positively linked to the SLK-SF-40 (r = 0.19). The subscales of the SLB-SF-38 were also correlated positively and significantly with both the SLA-SF-46 and the SLK-SF-40. Although some occasional nonsignificant and unexpected results were observed, the results of correlational analyses supported Hypotheses 5 and 6.
To conclude, the present findings offered solid and consistent evidence for the construct validity of the SLB-SF-38. The main scale and the six subscales were correlated with a series of wellvalidated measures developed to examine constructs related to service leadership. Besides, the SLB-SF-38 and the subscales were also correlated with Service Leadership Attitude Scale and Service Leadership Knowledge Scale, which assessed the different dimensions of the same underlying construct. Thus, the SLB-SF-38 is shown to be a valid and reliable measurement tool of the behavioral characteristics of a service leader.

DISCUSSION
The present study attempted to examine the reliability, convergent validity and dimensionality of the Short-Form Service Leadership Behavior Scale (SLB-SF-65) based on a large sample of Hong Kong undergraduates. The findings suggested the retention of 38 items, which can be grouped under six dimensions including "Self-improvement and Selfreflection, " "People and Principles Orientation, " "Resilience, " "Social Competence, " "Problem-Solving, " and "Mentorship." The results of multi-group CFA supported the stability of this factorial structure. Both the SLB-SF-38 and the six subscales presented good internal consistency and robust convergent validity.
In short, this study validated the SLB-SF-38 as a sound assessment tool to evaluate the behavioral attributes of service leaders.
There are several strengths of the present study. First, the development of the scales were driven by the Service Leadership Model, which has been extensively covered in the literature and shown to be beneficial to university students in Hong Kong (Shek and Chung, 2015;. Second, the present study employed a large sample which accounted for 5.36% of the total 84,388 Hong Kong undergraduates in the 2016/17 academic year (University Grants Committee [UGC], 2017). This large sample contributed to the robust findings (Biau et al., 2008). Third, the present study constructed an objective and psychometrically sound measurement tool to the leadership and youth development literature. Fourth, this study validated an objective measurement assessing service leadership behaviors in a Chinese context with an important role in the global service economy. The present six dimensions aligned well with the Service Leadership Model. First, the factor "Self-improvement and Selfreflection" (nine items) emphasizes the importance of reviewing and improving one's own leadership behavior as a continuous quest (Chung and Bell, 2015, p. 59). The second factor "People and Principles Orientation" (9 items) is concerned with having a set of personal code of ethics and treating others with care (Chung and Elfassy, 2016). This dimension is consistent with the morality, trust, fairness and respect emphasized in Service Leadership Model. Third, the dimension "Resilience" (seven items) measures an individual's ability to effectively respond toward stress, difficulty, and other unpleasant events in life (Shek and Lin, 2015c). This dimension can be conceptualized as an intrapersonal competence that enhances leadership effectiveness (Patel, 2012;Hatler and Sturgeon, 2013). Therefore, resilience constitutes an essential behavioral attribute of an effective service leader, and it is definitely a key component of service leadership education (Shek and Leung, 2015). The fourth factor "Social Competence" (five items) covers three aspects on one's capacity to effectively handle social interactions. These aspects include the ability to get along with other people, to build and accordingly maintain close relationships, and to behave appropriately in social settings (see Orpinas, 2010). This factor echoes the interpersonal competence outlined in Service Leadership Model. Fifth, the dimension "Problem-Solving" (five items) measures people's critical thinking when tackling difficult or complex issues (Altun, 2003). Problem-Solving falls into the category of intrapersonal competence as part of the service leadership education curriculum (Shek and Leung, 2015). Effective problemsolving is vital to leadership success (Mumford et al., 2000), and closely related to other intrapersonal competence such as emotion management (Mehrdad et al., 2011). Furthermore, service leaders may need to solve potentially conflicting needs of self, others, and the systems without compromising on morality. In this situation, critical thinking will help service leaders to see bigger picture and handle the problem in a timely manner (Jasovsky and Kamienski, 2007). Thus, the factor "Problem-Solving" underlies a dimension of behavioral attributes of service leadership. Lastly, the subscale "Mentorship" (three items) measures participants' capability and willingness to support other's development (Shek and Lin, 2015d), echoing the Competence and Care components highlighted in the Service Leadership Model. In short, the findings provide support for the "3-Cs" (Competence, Character and Care) of the Service Leadership Model. The results also echo the belief that both "being" (i.e., Character and Care) and "doing" (i.e., Competence) are important for effective leadership. The findings are pioneering in terms of constructing a validated measures of service leadership in Chinese societies.
The present study provides support for the developed tool on service leadership behavior. The findings enable crossinstitutional analyses on curriculum effectiveness, and also offer robust empirical support for the Service Leadership Model (Shek and Chung, 2015;. Theoretically speaking, the finings underscore the importance of the different dimensions of the measure as components of service leadership. This contributes to the development of the theory of service leadership. The present study has several practical implications. First, the SLB-SF-38 can be employed to assess the impact of a service leadership training program. As students are expected to demonstrate an improvement in behavioral attributes of service leadership after completing the program, educators can use this tool to assess the change. Second, the dimensionality of the SLB-SF-38 can be used to refine service leadership education curriculum. Specifically, the curriculum materials for future service leadership training may be tuned to focus on the six dimensions identified. Third, the SLB-SF-38 can be used by   (1) Self-improvement and Self-reflection the psychometric properties of the measure in other student populations. Besides, to further endorse the factorial validity of the SLB-SF-38, follow-up validation studies using a sample of executives (e.g., Acar and Zehir, 2009) or managers (e.g., Yukl et al., 2008) are suggested. Second, given that the present survey comprised over 250 items, response burden may influence the response quality (Lavrakas, 2008). Besides, content overlap could also be a "turnoff " for the respondents (Rolstad et al., 2011). In addition,  N = 2,240. Unless otherwise specified by superscript "n.s." which denotes statistical non-significance, all correlation coefficients are significant at p < 0.05 (two-tailed). SLK-SF-40, Scale score of the one-factor, 40-item Service Leadership Knowledge Scale; SLA-SF-46, Scale score of the eight-factor, 46-item Service Leadership Attitude Scale; SLA-F1, Factor "Vision and competence"; SLA-F2, Factor "People orientation"; SLA-F3, Factor "Caring disposition"; SLA-F4, Factor "Ethical role model"; SLA-F5, Factor "Social competence"; SLA-F6, Factor "Self-understanding and reflection"; SLA-F7, Factor "Positive view about human beings"; SLA-F8, Factor 8 "Unchangeable and dark human nature." although findings provide strong support for the internal consistency of the SLB, the test-retest reliability analyses can be conducted to examine the temporal stability of the measure in future. Nevertheless, our results showed good internal consistency of both the scale and the subscales (see Table 4), implying the quality responses from the participants (Oltedal et al., 2007). Third, the SLB-SF-38 relies on participants' self-rated leadership behavior, which may cause social desirability bias in responses. Participants may tend to provide favorable instead of truthful responses. Although we assured the participants that the responses would be kept confidential and anonymous, this limitation should be taken into account. In future, additional information collected from other informants (e.g., followers) would give a more comprehensive picture about service leadership behavior seen from different perspectives.
Finally, one can criticize that because the data are ordinal data, it is not appropriate to use parametric factor analysis. While we acknowledge this weakness of the present paper, we would like to make several arguments supporting the approach adopted in this study. Primarily, although there are contrary views, it is a common practice to treat ordinal data with several response categories as continuous data (Muthén and Kaplan, 1985). Second, it is also a common practice to apply CFA with ML estimation to test the model of Likert scale measurement (Byrne, 2010). For example, similar papers using CFA to analyze Likert scale data have been reported in some prestigious journals, including Frontiers in Psychology and Psychological Assessment (Young and Beaujean, 2011;Coates et al., 2016;Jorge-Monteiro and Ornelas, 2016;Ghislieri et al., 2017).
Third, Carifio and Perla discussed some common misunderstandings about Likert scales and regarded the claim that "because Likert scales are ordinal-level scales, only nonparametric statistical tests should be used with them" (Carifio and Perla, 2007, p. 114) as a common myth. They further pointed out that "if one is using a 5-7 point Likert response format, and particularly so for items that resemble a Likert-like scale and factorially hold together as a scale or subscale reasonably well, then it is perfectly acceptable and correct to analyze the results at the (measurement) scale level using parametric analyses techniques such as the F-Ratio or the Pearson correlation coefficients or its extensions (i.e., multiple regression and so on), and the results of these analyses should and will be interpretable as well" (Carifio and Perla, 2007, p. 115).
Fourth, we understand that other estimators (e.g., WLSMV) can be superior to ML when there are few ordinal categories. However, there are views supporting the application of ML for categorical data under specific conditions (Byrne, 2010). Some researchers have compared ML and other estimators applied for CFA analysis with ordered categorical data, such as WLSMV (Beauducel and Herzberg, 2006), WLS (Lei, 2009), GLS (Muthén and Kaplan, 1985;Hu and Bentler, 1998), and cat-LS (Rhemtulla et al., 2012). Most of these comparisons concluded that ML performed as good as or even better than other methods when (a) the data approximated a normal distribution (have mildly to moderately skewed/kurtosis variables), (b) there were more than five response categories, and (c) the sample size was not small. In this study, these three conditions were fully met. On the other hand, some researchers have highlighted the disadvantages of WLSMV. For example, Li pointed out the weaknesses of inter factor correlations and standard errors in WLSMV estimation "when the sample size is small, and/or when a latent distribution is moderately nonnormal" (Li, 2016, p. 948). In addition, DiStefano and Morgan (2014) also noticed that WLSMV may produce factor correlation estimates with overestimation when dealing with five or more ordered categories.
Finally, as suggested by Rhemtulla et al. (2012), the choice of available methods should rely on data characters (e.g., sample size, model size, the normality of distribution), the characters of constructs underlying (e.g., the distribution of the constructs), and researchers' own interests. In the present study, the data in general showed a normal distribution, the sample size was relatively large, and six response categories were used. In this regard, ML seems appropriate. As suggested by Allison et al. (1993, p. 92) recommended researchers "should consider staying with traditional parametric tests" when the above conditions are met. Obviously, ML provides better robust standard errors for factor correlations and the desirable asymptotic properties such as asymptotically efficiency (Lei, 2009;Rhemtulla et al., 2012).
In short, we understand the reviewer's concern. We acknowledge the related limitations of the study and we suggest a future study to be conducted to provide an additional picture. Despite this limitation, the present study provides pioneer and exciting support for a pioneer scale on service leadership behavior in a Chinese context.

CONCLUSION
Despite the above limitations, the present study provides evidence for a reliable and valid assessment tool of service leadership behavior. The present analyses provide a strong evidence base for the psychometric properties of the SLB-SF-38 by using a large sample of Chinese undergraduates. The current study fills the gap in the scientific literature on leadership assessment of leadership training amongst Chinese college students, and also provides practical implications for future service leadership education and research.

DATA AVAILABILITY
The datasets generated for this study are available on request to the corresponding author.

ETHICS STATEMENT
This study was approved by the Human Subjects Ethics Subcommittee (HSESC) (or its Delegate) of The Hong Kong Polytechnic University. All subjects have given written informed consent before start of the study.

AUTHOR CONTRIBUTIONS
DS designed the research project and contributed to all the steps of the work. DD contributed to the development of the article and revised the manuscript based on the critical comments and editing provided by DS. LM contributed to the initial data analyses and development of a rough draft of the manuscript.