ORIGINAL RESEARCH article
Front. Sports Act. Living
Sec. Sports Science, Technology and Engineering
This article is part of the Research TopicEmerging technologies in sports performance: data acquisition and analysisView all 14 articles
Key Factors and Tactical Variations in Chinese National Women’s Softball Games: Identification Using Random Forest and Explainable AI
Provisionally accepted- 1School of Physical education, Nanjing Tech University, Nanjing, China
- 2Beijing Sport University, Beijing, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Objective: This study employs machine learning to analyze data from Chinese women’s softball games, identifying key factors determining game outcomes. It explores patterns in how different teams develop winning strategies. Method: This study analyzed data from 81 of 296 games conducted between 2023 and 2024, using game outcomes (win=1, loss=0) as the target variable and 98 features as inputs. Machine learning models, including Random Forest (RF), XGBoost, KNN, and SVM, were implemented in Python and trained on a 7:3 train-test split. Model performance was evaluated using AUC, F1-score, accuracy, precision, and recall to identify the best-performing model. SHAP and PDP were then employed to evaluate feature contributions to game outcome predictions. Results: The RF model achieved the highest accuracy on the test set with an AUC of 97.7% (95% CI: 0.938, 0.993). We identified the ten features that had the most significant impact on game results, including P-ER, OBP, RBI, and AVG. PDP analysis further revealed that an increase in P-ER and P-H significantly increased the probability of losing; improvements in OBP and AVG substantially increased the chances of winning. Different teams exhibited varying strategic emphases in their decisive factors: Team SC relied heavily on pitching performance, while SH, LN, and JS prioritized batting strategies. Conclusion: Feature importance analysis from the RF model indicates that P-ER and key batting metrics (e,g., OBP, AVG)are significantly associated with predicting game outcomes. These findings highlight their importance in predictive models, though further research is needed to confirm their practical impact.
Keywords: softball, Prediction of victory or defeat, Key factors, machine learning, Athletic Performance Analysis
Received: 08 Sep 2025; Accepted: 31 Oct 2025.
Copyright: © 2025 Li, Cheng and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: 
Hanyao  Li, wjjdfq@njtech.edu.cn
Tianfeng  Zhang, zhangtf@njtech.edu.cn
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
