Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Med.

Sec. Translational Medicine

A clinical prediction model for schizophrenia based on machine learning algorithms

Provisionally accepted
Weifeng  JinWeifeng JinShuzi  ChenShuzi ChenQiong  GaoQiong GaoDan  LiDan LiWei  LuWei LuMengxia  WangMengxia WangQing  ChenQing ChenPing  LinPing Lin*
  • Shanghai Mental Health Center, School of Medicine, Shanghai Jiao Tong University, Shanghai, China

The final, formatted version of the article will be published soon.

OBJECTIVE: To develop an auxiliary diagnostic tool for schizophrenia based on multiple test variables using different machine learning algorithms. METHODS: This retrospective study used routinely collected peripheral blood biochemical indicators, along with demographic data, to develop a diagnostic model for first-episode schizophrenia.A total of 180 patients with first-episode schizophrenia between January and August 2024, and 214 healthy controls as a population undergoing routine medical examinations during the same period. Data on age, gender, and various blood test results were collected. The dataset was divided into a training set (70%; n=275) and a internal validation set (30%; n=119). First, Univariate logistic regression was used to analyze significant indicators (p<0.1), and feature selection was subsequently performed using the Boruta and LASSO algorithms. Machine learning models were then developed using seven machine learning algorithms, and the Area Under the Curve (AUC), Sensitivity, Specificity, Positive Predictive Value (Pos Pred Value), Negative Predictive Value (Neg Pred Value), Precision, Recall, and F1 score of each model were evaluated. Finally, we constructed an easily interpretable prediction tool based on a multiple logistic regression model. After model construction, we validated the model using an external validation set and a differential diagnosis set. A nomogram of the model outcomes was constructed, and its discrimination, calibration, and clinical decision curves were evaluated. RESULTS: Arg, TP, ALP, HDL, UA, and LDL were ultimately identified as significant predictors through Univariate logistic regression combined with the Boruta and LASSO algorithms. The Random Forest algorithm outperformed other machine learning models, achieving an AUC of 1.00 for the training set and 0.877 for the validation set. However, due to the risk of overfitting, we ultimately selected the multivariate logistic regression model as the final model for our study and constructed nomograms. CONCLUSION: In this study, an auxiliary diagnostic tool for schizophrenia was established using machine learning algorithms combined with routine blood indicators. The logistic regression model demonstrated good performance and can serve as a diagnostic aid for schizophrenia.

Keywords: ARG, tp, ALP, HDL, UA, LDL

Received: 17 Oct 2025; Accepted: 18 Nov 2025.

Copyright: © 2025 Jin, Chen, Gao, Li, Lu, Wang, Chen and Lin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Ping Lin, linpingsun20000@aliyun.com

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.