Assessing the learning potential of freshmen in labor education courses using ordinal features and support vector machine

Yan, Long; Yang, Yan

doi:10.3389/feduc.2025.1483964

ORIGINAL RESEARCH article

Front. Educ., 29 August 2025

Sec. Higher Education

Volume 10 - 2025 | https://doi.org/10.3389/feduc.2025.1483964

Assessing the learning potential of freshmen in labor education courses using ordinal features and support vector machine

Long Yan

Yan Yang^*

The College of Health Humanities, Jinzhou Medical University, Jinzhou, China

Introduction: Artificial intelligence (AI) marks a new wave of the information technology revolution and permeates various sectors as an indispensable tool. Despite its widespread adoption, its application in enhancing college students’ labor education remains scantily explored. Conventional teaching approaches often fail to assess students’ foundational knowledge accurately, impeding personalized learning. Hence, the current environment underscores the pressing necessity for a robust AI framework capable of reliably predicting individual students’ learning aptitude.

Methods: In this study we constructed a multidimensional feature vector model, leveraging data on students’ academic performance during their middle school years and their willingness to participate in college-level labor education. Through the usage of Support Vector Machines (SVM), we aim to assess students’ learning potential effectively. To validate the efficacy of our predictive model, we conducted jackknife cross-validation testing.

Results: Results indicate a remarkable overall accuracy rate of 97.75%, with an average sensitivity of 93.90% and an average specificity of 95.12%.

Discussion: The proposed method can play a role in enhancing teaching efficiency and tailoring interventions to individual students.

1 Introduction

As a driving force of the latest technological revolution and industrial transformation, artificial intelligence (AI) stands as one of humanity’s most remarkable and profound inventions (Han et al., 2022; Sun, 2017). It is reshaping production methodologies and societal dynamics, propelling us into an era of intelligent collaboration between humans and machines, characterized by cross-disciplinary integration, co-creation, and resource-sharing (Zhan and Yang, 2017; Fu and Zhou, 2020). Concurrently, the ubiquity of online course platforms has surged due to rapid information technology advancements (Oliveira et al., 2021; Aldowah et al., 2017; Bryson and Andres, 2020; Haleem et al., 2022; Chen et al., 2020). These platforms meticulously document student activities and performance metrics, accumulating vast repositories of educational data. However, despite this abundance, a significant portion of this data remains underutilized, lacking comprehensive mining to extract its latent value.

Data mining within the realm of artificial intelligence involves deploying algorithms to uncover implicit correlations, trends, and patterns from extensive datasets (Hand et al., 2001; Górriz et al., 2020; Moreno and Redondo, 2016; Beale, 2007). The concept of ‘knowledge discovery in database’ (KDD) was first proposed at the 11th International Joint Conference on Artificial Intelligence in 1989, and the term ‘data mining’ was introduced at the 1st International Conference on Knowledge Discovery and Data Mining held in Canada in 1995. Data mining as an interdisciplinary approach utilizes machine learning, pattern recognition, statistics, databases, and visualization techniques to extract valuable information from large datasets. Despite its complexity, data mining equips decision-makers with profound insights, facilitating well-informed and precise decision-making processes (Rui et al., 2022; Hoogerwerf et al., 2013; Pournaras, 2017). Data mining is tasked with handling large-scale data that is usually incomplete, ambiguous, and randomly structured. Therefore, the following requirements should be satisfied to ensure an effective data mining process (Han and Kamber, 2006; Li et al., 2014; Li et al., 2015; Yang et al., 2022; Yang and Wang, 2013; Yu et al., 2012): (i) The data must be of sufficiently large sizes and real; (ii) The discovered knowledge must align with the user’s needs and be interpretable; and (iii) The discovered knowledge must be applicable for addressing specific problems.

Data mining is a crucial research field related to artificial intelligence, statistics, and databases, enabling the development of intelligent and data-driven information technology systems (Losiewicz et al., 2000; Wang and Fu, 2005; Han and Chang, 2002). The pivotal roles of data mining include elucidating extracted information from data in forms of concepts, rules, patterns, and constraints. This harvested intelligence serves to assist decision-making processes or refine existing knowledge paradigms, thereby enhancing utilization of resources within extensive databases. Presently, data mining occupies a prominent position in both academic and industrial spheres, gathering widespread international attention and interest. In the educational domain, data mining technologies hold considerable potential to collect, analyze, and report on students’ learning behaviors and outcomes, thereby improving learning environments and providing pertinent guidance to educators.

Labor education for university students bears significant importance, carrying the mission of nurturing practical and innovative talents essential for national and societal advancement (Huan, 2019; Jiang and Pan, 2019). Under the current circumstances, the enactment of labor education for college students holds multifaceted significance: From a theoretical standpoint, it fosters the strengthening and advancement of values rooted in labor; In the context of our current era, it aids in the development of modern labor skills essential for adapting to advancements in science and technology; In terms of cultural values, it supports the preservation and promotion of the rich tradition of valuing hard work and dedication; Pragmatically, it serves as a driving force behind societal development and advancement.

The traditional teaching approach tends to prioritize the teacher’s authority and rely heavily on textbooks and classroom lectures, focusing mainly on transferring knowledge (Wang, 2007; Yao, 2003; De Lorenzis et al., 2023). Typically, teachers lecture while students passively receive information. However, this method often struggles to assess whether students have truly grasped concepts learned in class, let alone whether they have enhanced their overall skills. Consequently, it becomes challenging to provide timely and objective evaluations of teaching effectiveness and student learning. While this traditional method may streamline teaching for instructors, it fails to accommodate the diverse backgrounds and academic capabilities of students. Particularly, the students with weaker foundations or learning difficulties may not receive adequate support. To enhance teaching quality, educators must tailor their instructional approaches to suit individual student needs. For college students, traditional educational practices often fall short in evaluating the latent potential cultivated during middle and high school education, thus failing to meet the objective of precise instruction.

With the continuous advancement of artificial intelligence (AI) algorithms, it has become increasingly feasible to analyze and utilize large-scale datasets in greater depth (Callaway, 2024), thereby enabling accurate prediction and classification. This technological leap has introduced transformative changes to the field of education (Wang, 2024; Hu et al., 2024). AI not only assists educators in identifying students’ learning patterns but also facilitates the prediction of their future academic and professional potential. Accurately assessing students’ latent abilities developed during secondary and high school education remains a longstanding challenge for traditional educational approaches, while the integration of AI algorithms may offer an effective solution. Consequently, there is an urgent need for reliable and effective AI-based methodologies capable of accurately evaluating students’ fundamental learning potential.

In this study, we constructed a multidimensional feature vector integrating students’ middle school grades and their inclination towards learning. The Support Vector Machine (SVM) algorithm is then employed to evaluate this learning potential metric. Jackknife cross-validation testing is utilized to validate the method’s performance. The results demonstrate that our proposed technique achieves a high success rate in predicting student learning potential.

2 Materials and methods

2.1 Dataset

Between February 13th and February 20th, 2024, we collected data from all regions of China using random sampling method. This involved an online survey via self-filled questionnaires on the Internet. The inclusion criteria were voluntary participation of current and graduated college students in course evaluations. Through the use of questionnaires, we obtained grade data from the respondents.

1. General demographic information: including location and gender.

2. Grades in Chinese language, politics, and history courses during middle school and high school.

3. Willingness to participate in college labor education courses: categorized as general (60) or enthusiastic (90).

4. Based on the classification method of GPA for major (degree) courses in higher education institutions of China, college labor education course grades are categorized as follows: A – Excellent (≥91), B – Good (86–90), C – Average (76–85), and D – Fail (≤75).

We utilized the online survey tool “Questionnaire Star”¹ for data collection due to its efficiency, affordability, and user-friendly interface. This platform has been widely used across various survey-related fields. The data were obtained via self-administered questionnaires completed by participants with informed consent, thereby eliminating concerns regarding ethical violations or infringement of rights (Zhou et al., 2020; Zhu et al., 2020; Wu et al., 2020).

A total of 135 college students and graduates responded to the questionnaire. After filtering out incomplete responses, we obtained 123 valid questionnaires, resulting in a response rate of 91.11% (Table 1). Respondents were distributed across all regions of China (Figure 1A), with females comprising 57% of the sample (Figure 1B). The number of respondents reflects regional differences in educational development and population mobility. A relatively larger proportion of students were sampled from the more developed eastern regions, while fewer were sampled from the less developed western regions. The sampling distribution across other regions remained generally consistent. Furthermore, the gender and academic performance of students within the sample were relatively balanced. These characteristics support the reliability of our sampling methodology.

Table 1

Table 1. Grade distribution for university students in labor education course.

Figure 1

Map of China highlighting regions: Northwest China (3), North China (17), Northeast China (9), Southwest China (16), Central China (19), Eastern China (43), and South China (16). A pie chart shows a distribution between men and women, with men in blue and women in red.

Figure 1. Geographical distribution and sex ratio of the participants.

The grades were categorized into four classes: A, B, C, and D. Therefore, the dataset can be formalized as follows:

Ω = S^{A} \cup S^{B} \cup S^{C} \cup S^{D}

Where $\cup$ represents the symbol for “union” in the set theory.

2.2 Creating vectors with students’ grades and ordinal information

Based on the features that may be associated with students’ learning potential, as outlined in Section 2.1, we construct the following vector:

R_{k} = (r_{1}, r_{2}, r_{3}, r_{4}, r_{5}, r_{6}, r_{7}) .

Instead of considering the elements within a vector as separate units, we incorporated the order relationship among students’ secondary school results and their willingness. By sorting these results in ascending order, we created a new vector denoted as $R_{s}$ . This process ensures that the order relationships among the data are preserved.

R_{s} = (r_{i_{1}}, r_{i_{2}}, r_{i_{3}}, r_{i_{4}}, r_{i_{5}}, r_{i_{6}}, r_{i_{7}}) .

When two $r_{i}$ s are equal, they are sorted by lexicographic order of the words, ensuring each element’s ordinal number is unique. It implies that the following order

r_{i_{1}} \leq r_{i_{2}} \leq r_{i_{3}} \leq r_{i_{4}} \leq r_{i_{5}} \leq r_{i_{6}} \leq r_{i_{7}} .

Each $r_{i}$ has a unique “position” in $R_{s}$ , denoted by $g_{i}$ . The student’s secondary school performance and their willingness are represented by an n-dimensional vector V, which includes both position information and the original values.

V = (g_{1} \times r_{1}, g_{2} \times r_{2}, g_{3} \times r_{3}, g_{4} \times r_{4}, g_{5} \times r_{5}, g_{6} \times r_{6}, g_{7} \times r_{7}) .

For instance, suppose

R_{k} = (75, 101, 90, 86, 133, 82, 100),

and we obtain $R_{s}$ where the values are ranked from 1 to 7:

R_{s} = (75, 82, 86, 90, 100, 101, 133) .

Therefore,

\begin{array}{l} V = (1 \times 75, 6 \times 101, 4 \times 90, 3 \times 86, 7 \times 133, 2 \times 82, 5 \times 100) \\ = (75, 606, 360, 258, 931, 164, 500) . \end{array}

2.3 Support vector machine

Support Vector Machine (SVM) is a widely used supervised learning model in machine learning. SVM offer a principled approach to classification by explicitly optimizing the decision boundary, aiming to identify the hyperplane that maximizes the margin between different classes. It analyzes data to identify patterns and has been applied in various fields. The core principle of SVM involves representing input samples as points in a high-dimensional space and classifying them via an optimal separating hyperplane. SVM training aims for globally optimized solutions, which mitigate overfitting and enable handling a large number of features effectively. More detailed descriptions of the SVM method can be found in publications (Yang et al., 2022; Vapnik, 1998; Steinwart and Christmann, 2008; Chang and Lin, 2011).

The package LIBSVM v3.17 (Yang et al., 2022; Chang and Lin, 2011) which is an implementation of SVM classifier was used in this study. The Radial Basis Function (RBF) kernel, defined below, was selected as the kernel function for our model:

k (x, y) = exp (- γ ∣ x - y ∣ 2) .

The RBF kernel maps the input space into an infinite-dimensional feature space, enabling it to capture highly complex nonlinear relationships. As a local kernel, RBF assigns greater influence to similar samples, while the impact of dissimilar ones approaches zero. Its inherent smoothness prevents overly fluctuating decision boundaries, thereby contributing to strong generalization capabilities. These characteristics make the RBF kernel particularly well-suited for modeling many types of natural data.

Two parameters, the penalty parameter C and the kernel width parameter γ, were determined via an optimization procedure using a grid search strategy provided by LIBSVM.

2.4 Assessment of the prediction performance

In statistical prediction, researchers commonly use three cross-validation methods to assess the effectiveness of a predictor in real-world scenarios: independent dataset test, subsampling test, and jackknife test. Among these, the jackknife method evaluates model performance by iteratively leaving one sample out as the test set while using the remaining samples for training. This process is repeated for every sample in the dataset, ensuring that all data points are used for both training and evaluation. The method maximizes data utilization and provides a nearly unbiased estimate of model performance (Qiu et al., 2014; Li et al., 2022; Zhao et al., 2021). Therefore, we employed the jackknife test in our study to evaluate the anticipated success rates of our predictor. This test involves leaving out one sample at a time from the dataset Ω and evaluating it using the predictor trained on the remaining samples.

To assess accuracy of our model, we adopted sensitivity ( $S_{n}$ ), specificity ( $S_{P}$ ), accuracy ( $Acc$ ), and the Matthew’s correlation coefficient ( $MCC$ ), which are widely used for measuring the quality of binary classifications. They are defined as follows:

S_{n} = \frac{TP}{TP + FN}

S_{P} = \frac{TN}{TN + FP}

Acc = \frac{(TP + TN)}{TP + TN + FP + FN}

MCC = \frac{(TP \times TN) - (FP \times FN)}{\sqrt{(TP + FP) (TP + FN) (TN + FP) (TN + FN)}}

where $TP$ , $TN$ , $FP$ , and $FN$ stand for the number of true positive, true negative, false positive, and false negative obtained from the prediction, respectively.

3 Results

There are two hyperparameters for the SVM model with RBF kernel: the regularization parameter denoted as C and influence of a training example on the decision boundary denoted as γ. A hyperparameter tuning process is required to retrieve the optimal pair of C and γ, which is not predetermined for a specific problem. In this study, we employed a grid search approach from the LIBSVM package to determine these parameters. By testing various values of C and γ (Yang et al., 2022; Chang and Lin, 2011), we found the optimal values to be C = 2 and γ = 3.0517578123e-05. Initially, we converted the prior grades and class attendance willingness into a 7-dimensional feature vector for each of the 123 respondents, forming the LE123 dataset. These feature vectors were then scaled and inputted into the Support Vector Machine (SVM). Through the jackknife test, we calculated the sensitivity, specificity, overall accuracy and Matthew’s correlation coefficient, presented in Table 2. The method we used achieved an overall accuracy of 97.75%, with only four incorrect predictions. Notably, the sensitivity and specificity of each category exceeded 90%, averaging over 93%. Furthermore, the Matthews Correlation Coefficient (MCC) reached a value of 0.9573, indicating a high level of agreement between the predicted and actual classifications. These results indicate the potential effectiveness of our approach in predicting students’ interest in university labor courses.

Table 2

Table 2. Prediction results on the dataset LE123 in jackknife test.

4 Discussion

The ongoing research on the application of artificial intelligence technology has garnered significant attention from individuals across diverse sectors. This has led to the production and deployment of smart robots and similar technologies. Scholars worldwide have conducted research on artificial intelligence technology in healthcare, computer information technology, education, biological sciences and other fields (Mavrych et al., 2025; Choudhary et al., 2023; Aljuaid, 2024; Rajabi et al., 2024; Baig and Yadegaridehkordi, 2024; Farrokhnia et al., 2024; Jemmy et al., 2024). This technological innovation has presented both opportunities and challenges for labor education, particularly for college students, manifesting in issues related to the recognition of labor values, the transformation of labor content, and the evolution of labor methodologies. There is an urgent need to explore and promote the innovative and robust development of labor education within the framework of AI technology.

This study proposes a novel machine learning model based on the academic performance of high school students and their interest in labor education courses, utilizing Support Vector Machine (SVM) for predictive classification. The model yielded promising outcomes and demonstrated the feasibility of applying machine learning methods to the assessment of labor education.

Compared with most previous labor education studies that focused on qualitative analysis or relied on teachers’ subjective scoring methods (Black and Wiliam, 1998; Jesse et al., 2025), this study employed SVM to facilitate an objective and data-driven evaluation of students’ performance in labor education. This approach not only improves the efficiency and consistency of scoring but also mitigates cognitive bias, addressing recent concerns about “technical trustworthiness” and “fairness” in educational assessment (Williamson and Piattoeva, 2019; Selbst et al., 2019). Compared with earlier studies that utilized linear regression or other simple statistical models (Romero and Ventura, 2010), SVM demonstrates greater robustness in handling small-sample, non-linear classification problems. Its application in this study has shown its potential in the design of intelligent education systems.

This method could be integrated into college course management systems to automatically analyze students’ classroom participation, homework completion, and related behavioral data, thereby assisting instructors in conducting comprehensive assessments in labor education. In addition, the model could be used for the early identification of students encountering difficulties in labor education, enabling timely intervention to improve the effectiveness of teaching.

Although the model has achieved promising improvement in educational assessment, the relatively small sample size in this study may limit its generalizability. In future research, we plan to expand the dataset by incorporating more schools and students from diverse disciplinary backgrounds. Multimodal data and longitudinal tracking will also be integrated to enhance the reliability and external validity of the predictive model. We will also integrate AI ethics and educational equity principles to ensure the interpretability, fairness, and transparency of the intelligent scoring system.

5 Conclusion

In this study, we constructed a multidimensional feature vector that integrates students’ secondary school academic performance and their learning inclination. We then employed the SVM algorithm to assess students’ learning potential with this feature set. The performance of the method was validated using jackknife cross-validation, and the results showed that sensitivity, specificity, overall accuracy, and the MCC all exceeded 90%, indicating that the proposed technique is effective in predicting students’ learning potential.

Through our study, we aim to foster innovative perspectives on labor education reform, leveraging the application of artificial intelligence to support its evolution and progress. Additionally, our research introduces a novel approach by integrating data mining techniques into the teaching methodologies of college labor education courses, offering valuable insights for educational management entities seeking to tailor targeted teaching strategies for this demographic within the contemporary landscape of educational reform and technological advancement. The dataset in our current study is relatively limited. In future work, we plan to collect a larger and more diverse dataset in order to improve model accuracy and generalizability.

Data availability statement

The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.

Author contributions

LY: Software, Writing – original draft. YY: Resources, Writing – original draft, Writing – review & editing.

Funding

The author(s) declare that no financial support was received for the research and/or publication of this article.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Footnotes

1. ^https://www.wjx.cn/

References

Aldowah, H., Rehman, S. U., Ghazal, S., and Umar, I. N. (2017). Internet of things in higher education: a study on future learning. J. Phys. Conf. Ser. 892:012017. doi: 10.1088/1742-6596/892/1/012017