Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Psychiatry

Sec. Molecular Psychiatry

Volume 16 - 2025 | doi: 10.3389/fpsyt.2025.1621219

Integrated Bioinformatics and Machine Learning Identify S100A9 and VGLL1 as Hub Genes for Schizophrenia

Provisionally accepted
Jian  kang LvJian kang Lv1Xue  ru WangXue ru Wang2Wei  QinWei Qin3*
  • 1Shaoxing Seventh People's Hospital, shaoxing, China
  • 2Shan Dong Daizhuang Hospital, ji ning, China
  • 3Shandong Mental Health Center Affiliated to Shandong University, Jinan, China

The final, formatted version of the article will be published soon.

Background Schizophrenia (SCZ) is a debilitating neuropsychiatric disorder with unclear etiology, involving complex interactions between genetic and environmental factors. Current diagnostic methods rely on subjective clinical assessments, and existing treatments often fail to address cognitive and negative symptoms adequately. Identifying key biomarkers for SCZ is crucial for improving diagnosis and developing targeted therapies. Methods This study integrated bioinformatics analysis and machine learning approaches to identify potential biomarkers for SCZ. Transcriptomic data from five independent cohorts were obtained from the GEO database. Differential expression analysis and Robust Rank Aggregation (RRA) were used to identify significant differentially expressed genes (DEGs). Protein-protein interaction (PPI) network, Least Absolute Shrinkage and Selection Operator (Lasso) regression and Random Forest (RF) were employed to screen for hub genes. The diagnostic model was constructed using logistic regression. The receiver operating characteristic (ROC) curve was used to evaluate diagnostic accuracy of the model, and nomograms and calibration curves were performed to evaluate their clinical applicability. Functional enrichment analyses and single-sample Gene Set Enrichment Analysis (ssGSEA) were conducted to explore the underlying mechanisms of the identified hub genes. Results S100A9 and VGLL1 were determined as potential diagnostic biomarkers for SCZ. The diagnostic model demonstrated robust diagnostic performance in the training cohorts (AUC = 0.806) and external validation cohorts (AUC = 0.702, 0.666 and 0.739). Functional enrichment analyses revealed that DEGs related to VGLL1 and S100A9 were primarily involved in immune system regulation and signaling pathways such as PI3K-Akt signaling pathway. ssGSEA showed significant increases in the infiltration levels of five immune cell types (CD56bright natural killer cells, MDSCs, mast cells, natural killer cells, and plasmacytoid dendritic cells) in SCZ patients, with strong positive correlations between S100A9 and these immune cell infiltrations. Conclusion Our study identified S100A9 and VGLL1 as potential biomarkers for SCZ, highlighting their roles in immune regulation. These findings provide new insights into the pathogenesis of SCZ and suggest potential diagnostic targets.

Keywords: Schizophrenia, S100A9, VGLL1, Bioinformatics analysis, machine learning

Received: 30 Apr 2025; Accepted: 21 Aug 2025.

Copyright: © 2025 Lv, Wang and Qin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Wei Qin, Shandong Mental Health Center Affiliated to Shandong University, Jinan, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.