Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Sports Act. Living

Sec. Sports Science, Technology and Engineering

This article is part of the Research TopicEmerging technologies in sports performance: data acquisition and analysisView all 14 articles

Key Factors and Tactical Variations in Chinese National Women’s Softball Games: Identification Using Random Forest and Explainable AI

Provisionally accepted
Hanyao  LiHanyao Li1*Gang  ChengGang Cheng1,2Tianfeng  ZhangTianfeng Zhang1*
  • 1School of Physical education, Nanjing Tech University, Nanjing, China
  • 2Beijing Sport University, Beijing, China

The final, formatted version of the article will be published soon.

Objective: This study employs machine learning to analyze data from Chinese women’s softball games, identifying key factors determining game outcomes. It explores patterns in how different teams develop winning strategies. Method: This study analyzed data from 81 of 296 games conducted between 2023 and 2024, using game outcomes (win=1, loss=0) as the target variable and 98 features as inputs. Machine learning models, including Random Forest (RF), XGBoost, KNN, and SVM, were implemented in Python and trained on a 7:3 train-test split. Model performance was evaluated using AUC, F1-score, accuracy, precision, and recall to identify the best-performing model. SHAP and PDP were then employed to evaluate feature contributions to game outcome predictions. Results: The RF model achieved the highest accuracy on the test set with an AUC of 97.7% (95% CI: 0.938, 0.993). We identified the ten features that had the most significant impact on game results, including P-ER, OBP, RBI, and AVG. PDP analysis further revealed that an increase in P-ER and P-H significantly increased the probability of losing; improvements in OBP and AVG substantially increased the chances of winning. Different teams exhibited varying strategic emphases in their decisive factors: Team SC relied heavily on pitching performance, while SH, LN, and JS prioritized batting strategies. Conclusion: Feature importance analysis from the RF model indicates that P-ER and key batting metrics (e,g., OBP, AVG)are significantly associated with predicting game outcomes. These findings highlight their importance in predictive models, though further research is needed to confirm their practical impact.

Keywords: softball, Prediction of victory or defeat, Key factors, machine learning, Athletic Performance Analysis

Received: 08 Sep 2025; Accepted: 31 Oct 2025.

Copyright: © 2025 Li, Cheng and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Hanyao Li, wjjdfq@njtech.edu.cn
Tianfeng Zhang, zhangtf@njtech.edu.cn

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.