Exploring the predictors affecting the sense of community of Korean high school students: application of random forests and SHAP

Adolescence is a stage during which individuals develop social adaptability through meaningful interactions with others. During this period, students gradually expand their social networks outside the home, forming a sense of community. The aim of the current study was to explore the key predictors related to sense of community among Korean high school students and to develop supportive policies that enhance their sense of community. Accordingly, random forests and SHapley Additive exPlanations (SHAP) were applied to the 7th wave (11th graders) of the Korean Education Longitudinal Study 2013 data (n = 6,077). As a result, 6 predictors positively associated with sense of community were identified, including self-related variables, “multicultural acceptance,” “behavioral regulation strategy,” and “peer attachment,” consistent with previous findings. Newly derived variables that predict sense of community include “positive recognition of volunteering,” “creativity,” “observance of rules” and “class attitude,” which are also positively related to sense of community. The implications of these results and some suggestions for future research are also discussed.


Introduction
Adolescence is the stage of developing self-identity when individuals gradually begin to recognize themselves as independent beings and develop social adaptability through meaningful interactions with others (Erikson, 1968).Social development or adaptability, the ability to build and maintain interpersonal relationships according to the norms of an individual's community, is one of the developmental tasks of adolescence and an essential part of becoming a healthy and mature member of society (Ranju and Manisha, 2015).During this period, students gradually expand their social networks outside the home, forming a sense of community.Thus, the adaptive aspect of adolescent social development manifests through a sense of community.In other words, adolescents develop a variety of relationships with their teachers and peers in school life and form perceptions of their school community, regional community, and nation, which in turn leads to their sense of community.A sense of community reflects the ability to recognize a network of mutually supportive relationships among the members of a community; thus, the development of a sense of community in adolescence is particularly important because it leads to citizenship and social engagement in adulthood (Sarason, 1974).
However, there are growing concerns regarding Korean adolescents; living in hypercompetitive societies, they currently have a strong sense of selfishness and individualism as well as a tendency toward isolation and disconnection from each other.Additionally, they show a noticeable lack of communal ethics, such as care, attachment and responsibility for others (Lee, 2006).In particular, Korean high school students are exposed to a more competitive entrance examination atmosphere than middle school students are, which can lead to excessive competition and heavy learning demands, making it difficult for them to develop social skills (Yoon and Sung, 2021).On the other hand, given that empirical studies have shown that a sense of community may promote high school students' wellbeing, such as their satisfaction with life (Hombrados-Mendieta et al., 2019), mental health (Terry et al., 2019) and happiness (Lee and Cho, 2023), it is necessary to identify the factors that have meaningful impacts on fostering a sense of community among high school students and developing supportive policies that enhance this sense of community.
However, the above studies are similarly limited; they have all employed classical statistical modeling, such as correlation analysis, structural equation modeling and multilevel modeling, to analyze only a limited number of variables based on the theoretical background and researcher's interest.They do not extensively examine the variables that affect sense of community.Additionally, the main disadvantage of conventional statistical techniques is that as the number of input variables and the number of possible interactions among variables increases, models become more complex and the standard errors of regression coefficients increase (Bzdok et al., 2018).
Machine learning approaches, however, are useful because they facilitate analysis even amid complex, nonlinear interactions and provide opportunities to discover predictors that might not otherwise be identified (Bzdok et al., 2018;Yi and Na, 2019).Hence, to overcome the problems associated with traditional statistical methods and to reveal the important variables that predict high school students' sense of community, this study adopts random forests (Breiman, 2001), a machine learning technique.
On the other hand, random forests, so-called black boxes that emphasize predictability, cannot provide a simple description of the relationship or direction between predictors and dependent variables due to complicated interactions among the input variables (Buhrmester et al., 2021).Therefore, we have also applied the SHapley Additive exPlanations (SHAP) (Lundberg et al., 2018), i.e., eXplainable Artificial Intelligence (XAI), to examine and interpret the tendencies between the important variables derived through random forests and sense of community.
The following research objectives guided the present study: (a) to explore the key predictors of sense of community among Korean high school students through random forests and (b) to identify the relationships between high school students' sense of community and these key predictors.For this purpose, the research questions to be confirmed in this study are as follows: Research Question 1.What are the key predictors of high school students' sense of community according to random forests?
Research question 2. How is the relationship between the main explanatory variable of high school students' sense of community derived through random forests and their sense of community?
2 Literature review

The concept of sense of community
According to Sarason (1974, p. 1), who first conceptualized the term, sense of community is "the sense that one was the part of a readily available, mutually supportive network of relationships upon which one could depend and as a result of which one did not experience sustained feelings of loneliness." McMillan and Chavis (1986, p. 9) later proposed a comprehensive theory of sense of community: "the feeling of belonging or of sharing a sense of personal relatedness, a sense of mattering to one another and to the group, the feeling that members" needs will be met by the resources received through their membership in the group, and the belief that members have shared and will share history and similar experiences' .

Factors predicting sense of community
Studies investigating the variables that influence sense of community have reported a range of factors, which can be categorized as student-, family-, or school-related factors.

Student-related factors
Research examining "gender" differences in sense of community has yielded mixed results.Some studies have reported greater levels of sense of community among male students (Kim and Kim, 2015;Leiva et al., 2021), others have shown greater levels of sense of community among female students (Kissinger et al., 2009;Park, 2019), and others have indicated no significant gender differences (Wilkinson, 2008;Park et al., 2015).The self-related variables of "self-efficacy" and "selfesteem" are positively associated with students' sense of community (Park, 2021;Chon and Kim, 2022)."Career maturity" (Jung et al., 2018), "multicultural acceptance" (Choi and Lee, 2021), and "life satisfaction" (Cantarero et al., 2007) have also been defined as predictors of students' sense of community.
Students' "academic achievement" (Wighting et al., 2015;Yi et al., 2017) is also positively associated with their sense of community.Additionally, the more satisfied students are with their school life and the better they have adjusted to school, the greater their sense of 10.3389/fpsyg.2024.1337512Frontiers in Psychology 03 frontiersin.orgcommunity is (Albanesi et al., 2007;Heo, 2020).On the other hand, higher "internet addiction" (Min, 2017) and "mobile phone dependence" (Kim and Lee, 2020) are negatively associated with a sense of community.

Family-related factors
In terms of family demographic characteristics, higher levels of household income and parental education have positive effects on children's sense of community (Zaff et al., 2008;Koo and Yoo, 2021).
The impact of parents-child relationships is positively related to children's sense of community (Yi et al., 2017;Kim and Kim, 2018).Family support also influences children's sense of community.Vieno et al. (2007) noted that "parental social support" plays a significant role in maintaining interpersonal relationships outside the home, even though the main elements of school life are mostly determined by teachers and peers.In addition, Kwak (2017) found that students who perceive high "parental educational support" tend to have a greater sense of community, while Liu et al. (2022) revealed that "parents" emotional support' is positively related to students' sense of community.Furthermore, Kuperminc et al. (2008) reported that students' perceived "parental school involvement" positively contributes to their sense of community.Moreover, positive "parental rearing behavior, " where parents nurture their children with attention and affection, is an important factor in improving children's sense of community (Park, 2019).Students who report more "parental monitoring, " which refers to the degree to which parents are interested in and observe their children's leisure time and friendships, also tend to have a greater sense of community; students who perceive more "parental control, " which means the degree to which parents seek to control every aspect of their children's lives, tend to have a lower sense of community (Vieno et al., 2005).

School-related factors
Given that their school is where students spend the most time, students' teachers and peers at school, as well as the variables related to school climate, influence their sense of community.
Several studies have indicated that greater "teacher academic pressure" and "teacher enthusiasm" are positively associated with students' sense of community (Koo and Yoo, 2021).Specifically, Dewaele and Li (2021) revealed that the more students perceive their teachers' enthusiasm, the greater their social-behavioral engagement.Additionally, a positive "relationship between teacher and student" improves students' sense of community (Jung, 2020).
Furthermore, greater "social support from peers" is positively related to a greater sense of community (Park et al., 2015;Hombrados-Mendieta et al., 2019).Vieno et al. (2005) showed that "school-level mean socioeconomic status (SES), " "father's education level and occupational status" and "democratic school climates, " e.g., freedom of expression, perceived rule fairness and equal student treatment are important factors driving students' sense of community.However, the structural characteristics of schools, such as "school size, " "school sector, " and "school physical facilities, " are not statistically significant.Moreover, a "schools" pro-human rights culture' increases students' civil consciousness (Kim and Kim, 2015), while students' level of school violence is negatively related to their sense of community (Yoon et al., 2018).The quantity of experiential activity time and the degree of satisfaction with adolescent extracurricular activities are also positively related to a sense of community (Shin and Chun, 2017;Cho and Han, 2019).

Participants
The current study used data from the 2013 Korean Education Longitudinal Study (KELS2013) to explore the factors that affect high school students' sense of community.The KELS2013, conducted by the Korean Educational Development Institute, is a panel dataset that contains actual data on students' educational experiences and outcomes and provides fundamental information for the establishment and evaluation of educational policies.Specifically, the KELS2013 data were collected from 7,324 5th graders attending 242 elementary schools across the country who were chosen by stratified cluster random sampling in 2013.In this study, only data from the 7th wave (11th graders, second year of high school) were used.Students who did not respond to the item measuring sense of community were excluded from the analysis.Thus, the final sample consisted of 6,077 students, and their demographic characteristics were as follows: 3,012 boys (49.6%) and 3,065 girls (50.4%).Among the schools, 16.7% were located in metropolitan areas, 24.5% were located in major cities, 41.6% were located in small to medium-sized cities, and 17.2% were located in rural areas.Additionally, 55.3% of the participants were enrolled in public schools, and 44.7% were enrolled in private schools, 59.8% were enrolled in coeducational schools, 19.7% were enrolled in boys' schools, and 20.5% were enrolled in girls' schools.

Variables
The dependent variable was sense of community based on 12 items, e.g., "When I grow up, I will participate in elections and vote;" "I care about others before me;" and "I help my friends who are behind in school, " measured using a self-report scale.These items were rated on a 5-point Likert scale ranging from 1 (strongly disagree) to 5 (strongly agree), and the mean of the 12 questions on sense of community was used in the analysis, with higher scores representing a greater sense of community.Using the Cronbach's alpha coefficient formula (Cronbach, 1951), the Cronbach's alpha for sense of community was calculated based on a value of 0.878.This result exceeded the acceptable threshold of 0.7 (Nunnally, 1978), indicating a satisfactory level of reliability.
A total of 159 predictive variables were included in the present study (80 student-, 48 family-, and 31 school-related variables that may directly or indirectly affect sense of community).The specific list of variables is shown in Table 1, and the data preprocessing procedure used for the predictors is described below.

Data preprocessing
For the feature analysis, we used the average of each subfactor or the individual items, and the categorical items from the KELS2013 questionnaires were dummy-coded.In addition, for certain items, observations with multiple responses, no response, or an "I do not   know" response were considered missing.Additionally, for ease of interpretation, some items were reverse-coded (e.g., "I often choose to watch TV or play before I do my homework" in self-management;

Category Variable name Scale
"More foreigners will lead to more crimes" in multicultural acceptance).Among the parental variables, monthly household income and monthly educational expenses were utilized in the  analysis after taking the natural logarithm.Students who did not respond to the questions requesting information about their school sector and coeducation were excluded from the analysis (n = 70).Next, variables with more than 20% missing data were removed from the analysis; for other missing data, single imputation was performed using the "mice" package (Ver, 3.14.0) in R 4.2.2 (van Buuren and Groothuis-Oudshoorn, 2011).

Data analysis
In this study, we employed random forests, a machine learning technique, to explore the important variables that predict high school students' sense of community.Random forests are ensemble methods based on the decision tree method and bagging (bootstrap aggregating).In random forests, students were randomly sampled with replacement to generate numerous bootstrap samples, and the final prediction was determined by aggregating the results of all the singular decision trees obtained from each bootstrap sample (Breiman, 2001).
The steps in applying random forests were as follows.First, all the data were randomly divided into two sets: a training set (70% of the sample) and a test set (30% of the sample).The random forests model was built with the training data.Second, when generating the individual decision trees, we set the number of variables to be selected at each node to 53 (number of predictors/3 for regression trees) in accordance with Breiman's (2001) recommendation.Third, to perform hyperparameter tuning for the random forests model, we applied a grid search via scikit-learn's GridSearchCV (Ver, 1.0.2 for Python 3.9).The number of trees was chosen from 100, 300, 500, 700, 1,000 and 1,500.Based on 10-fold cross-validation, the optimal hyperparameter that produced the best performance metric (highest R-Squared) was selected (Snider et al., 2021).Fourth, to evaluate the performance of the random forests model, we calculated the root mean squared error (RMSE) and R-squared (R 2 ) on the test set, which are typical evaluation metrics for regression models.
After the random forests, the SHAP value was derived to determine the importance of each feature, which indicates its relative contribution to the model's prediction.SHAP is based on coalitional game theory, which can help explain the output of complex machine learning models (Lundberg and Lee, 2017;Lundberg et al., 2018).We derived the top 10 key variables selected by the mean absolute Shapley values, which contributed most significantly to predicting the sense of community among high school students.To interpret the relationships between the key predictors and the dependent variable, sense of community, we constructed a SHAP summary plot and SHAP dependence plots.The random forests were conducted using the RandomForestRegressor class from the scikit-learn libraries in Python 3.9, and the SHAP algorithm was applied using the TreeExplainer class from the SHAP library.

Model tuning
After performing a grid search using 10-fold cross-validation, it was revealed that the highest cross-validation score occurred when the number of decision trees was set to 700, which was used to generate the random forests model.The final model, incorporating the optimal hyperparameters, exhibited an RMSE of 0.333 and an R 2 of 0.680.

Feature importance
After fitting the random forests, the absolute average of the SHAP values was calculated to assess the relative importance of the features in predicting the sense of community.Figure 1 shows the top-10 features with the highest feature importance in descending order, and the horizontal length of each bar indicates the magnitude of the average contribution to the model (Taye et al., 2023).As shown in Figure 1, "academic self-concept" was derived as the variable most relevant to high school students' sense of community.Next, "multicultural acceptance in relationships with multicultural neighborhoods and friends, " "positive recognition of volunteering, " "observance of rules, " "social self-concept, " "creativity, " "behavioral regulation strategy for requesting help and utilizing resources, " "class attitude, " "peer attachment" and "self-management" were ranked as the top 10 variables according to feature importance (average absolute Shapley value), suggesting that these features contribute the most significantly to predicting individual sense of community.

Feature interpretation
To understand the directionality of the influence of the top 10 features on high school students' sense of community, we constructed the SHAP summary plot shown in Figure 2. The SHAP summary plot combines the feature importance and direction of the impact of each feature on the sense of community (Molnar, 2022).The y-axis displays the top 10 predictors of sense of community, ranked in order of average influence on the prediction, and the x-axis describes the Shapley value related to each feature.Individual-negative Shapley values extending to the left are interpreted as decreased levels of sense of community; positive Shapley values extending to the right are interpreted as increased levels of sense of community (Taye et al., 2023).Each dot on the SHAP summary plot corresponds to the Shapley value of a feature of an individual with different colors; red dots represent high feature values, and blue dots represent low feature values.For instance, Figure 2 shows that "academic self-concept" is the most important feature for predicting a sense of community, and has a positive impact on predicting a sense of community because the red dots representing high values of the variable are located to the right of the y-axis with a SHAP value of 0. The other nine variables also show a gradual transition from blue to red along the vertical axis, indicating that all the other predictors have a positive impact on the sense of community, since the level of SHAP increases as the values of all the variables increase.
In Figure 3, the exact relationships between the top 10 predictors and sense of community are visualized through the SHAP dependence plot.The SHAP dependence plot is a scatter plot of predictors and Shapley values, where the x-axis indicates the raw values of each predictor and the Shapley value of the same predictor is on the y-axis (Lundberg et al., 2020;Zhang et al., 2022).Figure 3 shows that as the raw value of "academic self-concept" increases from 1 to 5, the SHAP value also shifts from negative to positive.This can be interpreted as indicating a positive relationship between the two variables, with higher academic self-concept tending to be associated with a greater sense of community.The other nine predictors also showed that as the raw value increased, the SHAP value tended to increase, suggesting positive relationships between the predictors and sense of community.

Discussion
The aim of this study was to reveal the top 10 predictor variables with the highest impact on high school students' sense of community based on 159 variables in the KELS2013 study and by using random forests, a machine learning technique, and to visualize and interpret the relationship between these key predictors and sense of community by applying the SHAP algorithm.This section therefore elaborates on the novel findings of the current study and compares them with relevant results in the literature.
First, student-related factors such as "self-concept, " "selfmanagement, " "multicultural acceptance, " "positive recognition for volunteering" and "creativity" are the key variables predicting high school students' sense of community.Specifically, self-related variables such as "academic self-concept, " "social self-concept" and "selfmanagement" are the top variables with a positive relationship with sense of community.These results are supported in the literature (Park, 2021;Chon and Kim, 2022).Therefore, it is important for schools to recognize the importance of intrapersonal variables in improving a sense of community and encouraging students to develop a healthy self-concept and perceive themselves positively.Second, "multicultural acceptance in relationships with multicultural neighborhoods and friends" is an important variable for sense of community, and "multicultural acceptance" is also positively related to the latter.Considering that multicultural acceptance and sense of community are concepts that emphasize living as a community while recognizing the diversity of individuals (Baek and Chung, 2017), these two variables are closely related.Thus, it is necessary to expand the scope of education from the individual to the social level and to provide a school environment where students can interact with diverse people to improve their sense of community.On the other hand, previous studies have provided mixed results concerning the causal relationship and direction between "multicultural acceptance" and sense of community.Some studies indicate that "multicultural acceptance" affects sense of community (Park, 2021); others suggest that sense of community influences "multicultural acceptance" (Choi, 2019).Consequently, in-depth and further research is needed to explore and clarify the relationships between these two variables.
Third, "positive recognition of volunteering" and "creativity" are key variables with a positive relationship to sense of community; they are remarkable in that they are novel variables.Rhoads (1998) suggested that community service involves interaction with a variety of others and that in this process, community service serves as a vehicle for creating communal ties.In addition, since the "positive recognition of volunteering" item in this study measures attitudes toward voluntary and active engagement in volunteer activities, "positive recognition of volunteering, " which intrinsically motivates them to continue to volunteer, leads to an increase in students' sense of community.Moreover, some studies have reported that openness and extraversion are strongly related to creativity (Kaufman et al., 2016;Lebedeva et al., 2018;Miroshnik et al., 2022).Accordingly, there is likely a relationship between "creativity" and a sense of community that emphasizes a community's willingness to embrace diversity (Kim et al., 2020).In other words, students with higher levels of "creativity" are more likely to possess a diverse range of ideas and open-minded perspectives, which may foster their ability to respect and embrace differing opinions among others.
Fourth, "behavioral regulation strategy for requesting help and utilizing resources" and "peer attachment" are key variables related to sense of community and are positively related to it.The "behavioral regulation strategy for help and resource utilization" in this study entails seeking help from teachers and friends when in trouble (e.g., "If there's something I'm not sure about, I ask an acquaintance"), and "peer attachment" represents forming positive relationships with friends by, for example, trusting them and confiding in them (e.g., "I can tell my friends what's on my mind").Given that the variables related to relationships with others are key predictors of sense of community, these variables may be important for living with community members in a communal society.Additionally, these findings are consistent with the literature (Kang and Jang, 2013;Park et al., 2015;Kim and Kim, 2018;Hombrados-Mendieta et al., 2019), which has demonstrated that positive interpersonal relationships with teachers or peers are positively associated with students' sense of community.
Fifth, "observance of rules" and "class attitude" are novel key predictors; they are positively related to sense of community.In this study, "observance of rules" refers to keeping promises to others and following class rules, and "class attitude" entails following academic etiquette, such as paying attention in class and doing well on assignments.Therefore, not only maintaining good relationships as a member of a school community but also maintaining common rules and regulations are important for the formation of a sense of community.

Conclusion and future directions
While studies on variables that influence sense of community have examined only limited variables, based on their theoretical background or literature review, with conventional statistical techniques, the present study meaningfully identified numerous variables that have not been considered in the literature, as the key predictors of sense of community.Hence, this study is valuable because it reveals which variables should be considered important in fostering a sense of community among high school students by using random forests, a machine learning technique.Additionally, the present study has contributed to the literature through its novel application of SHAP to overcome the disadvantage of random forests in terms of the difficulty of interpreting their results, illustrating the relationship between sense of community and its key predictors by visualizing them.
This study has several limitations.First, this study is limited by its exclusive utilization of data collected from Korean students surveyed within Korea.Subsequent research is therefore expected to provide richer insights into sense of community by conducting cross-national comparative studies that utilize large-scale international survey data to explore the cultural context characteristics related to the formation of sense of community.Second, this study used cross-sectional data, which limits the exploration of developmental factors explaining changes in sense of community, and caution should be exercised when interpreting the causal relationships among variables.Therefore, future studies could explore the variables related to changes in the sense of community from a longitudinal perspective.
Class concentration (Korean, math, English) 1 (10 Min or less) ~ 5 (41 Min or more) Class comprehension (Korean, math, English) 1 (20% or less) ~ 5 (81% or more) Participation in volunteer activities, preparatory education (Math, English) participating in after-school programs, doing school homework, doing tutoring homework, attending tutoring lesson, attending online lesson, reading books, watching television, spending time with friends, helping with household chores) 0 (None) ~ 4 (3 h or more) Computer/smart media usage time (study/homework, non study information search and resource utilization, text/chat/message/email/call, game/entertainment, club activities/online cafes/community engagement) 0 (None) ~ 5 (3 h or more) Educational aspirations 1 (Middle school) ~ 5 (Ph.D.) College enrollment plans 0 = Outside the capital (Seoul, Korea), 1 motivation (external motivation, introjected motivation, Identified motivation, intrinsic motivation, amotivation) Academic achievement goal orientation (mastery approach, mastery avoidance, Performance approach, performance avoidance) Self-efficacy (Korean, math, English) Cognitive self-regulated learning strategies (rehearsal, elaboration, clustering, meta-cognition) Behavioral regulation strategy (effort regulation, time management, space management, requesting help and utilizing resources) Preference for cooperative learning Preference for competitive learning, meta-cognition 1 (Strongly disagree) ~ 4 (Strongly agree) father, mother, brothers, sisters, grandparents, relatives)    Engagement in private tutoring Experience of activities for career decisions (counseling with the homeroom teacher, counseling with the private tutors, counseling with relatives/neighbors/etc., trying career-related assessments, visiting to advanced schools, gathering information through mass media (TV, newspapers, etc.), gathering information through online communities, purchasing career-related books and web search, participating in career explanation sessions) Experience studying abroad Experience attending school located abroad Parental involvement in school organization activities (joining and participating in the school's parent association, volunteering at the school, attending parent conferences, observing open classes, attending parent education sessions) 0 = No, 1 = Yes Academic expectations for the child 1 (Middle school) ~ 5 (Ph.D.) Aspirations for the child's college enrollment 0 = Outside the capital (Seoul, Korea), 1 = The capital (Seoul, Korea) School satisfaction (overall satisfaction) 1 (Strongly disagree) ~ 5 (Strongly agree) Educational Broadcasting System education programs Use of online home-learning program Activities to understand the child's school life (contacting teachers through phone calls/messages/emails, counseling with school teachers, posting opinions or questions on the school website bulletin board, checking the child's class bulletin board, viewing the child's school records) parental support (academic achievement, emotional well-being) Child-perceived feelings of alienation from parents Parent-perceived parental support (academic achievement, emotional well-being) School satisfaction (school safety, school activities) Relationships with those who share information about the child 1 and occupational programs (counseling, classes, seminars, visits to colleges, field trips, on-site job experiences, virtual job experiences) After-school programs (Korean, math, English, arts/physical) Participation in club activities (academic learning, science research, art/craft, music, sports, newspaper/broadcasting, leisure/game, youth organization, volunteering) achievement pressure, teacher enthusiasm) School violence (degree of violence within the school, personal experience of being a victim of violence) 1 (Strongly disagree) ~ 5 (Strongly agree)

FIGURE 1
FIGURE 1Average absolute SHAP values for the top 10 features ordered by feature importance.

FIGURE 2 SHAP
FIGURE 2 SHAP summary plot with the top 10 features ordered by feature importance.
FIGURE 3 SHAP dependence plot illustrating the distribution of Shapley values.

TABLE 1
Predictive variables used in the random forests model.