Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Mol. Biosci.

Sec. Molecular Diagnostics and Therapeutics

Integrated bioinformatics and Mendelian randomization reveal a six-gene diagnostic signature and key role of CYP26B1 in sarcopenia

Provisionally accepted
  • 1Guangzhou University of Chinese Medicine, Guangzhou, China
  • 2Guangzhou University of Traditional Chinese Medicine First Affiliated Hospital, Guangzhou, China

The final, formatted version of the article will be published soon.

Background: The pathogenesis of sarcopenia involves complex molecular mechanisms, and treatment remains challenging, with a lack of reliable diagnostic biomarkers. The objective of this study is to identify biomarkers that may be linked to sarcopenia, examine how these biomarkers correlate with immune cell infiltration, and investigate the genes that exhibit a causal relationship with sarcopenia. Methods: Four transcriptomic datasets were integrated to identify candidate biomarkers. Genes from the MEBrown module of weighted gene co-expression network analysis (WGCNA) analysis were cross-referenced with differentially expressed genes (DEGs). A diagnostic model was built using 113 machine learning algorithms, followed by protein-protein interaction (PPI) network analysis and SHapley Additive exPlanations(SHAP) evaluation. Immune cell quantification and correlation with sarcopenia-related genes were performed using CIBERSORT, while gene expression data was integrated with genome-wide association statistics (GWAS) and gene expression quantitative trait loci (eQTL) data. In vitro validation was carried out using C2C12 cells and quantitative polymerase chain reaction (qPCR) experiments. Results: We found 318 DEGs. By comparing the WGCNA gene with these DEGs, we found 109 possible biomarkers, which are related to immune regulation, muscle cytoskeleton regulation and retinol metabolism. A six-gene diagnostic signature (FOXO1, ZBTB16, HOXB2, LYVE1, MGP, and CYP26B1) was developed using machine learning and PPI network analysis, achieving high predictive accuracy (AUC > 0.80), with HOXB2 identified as the top predictor via SHAP analysis. CIBERSORT analysis showed the relationship between these genes and immune cell subsets, while mendelian randomization (MR) analysis confirmed the causal relationship between the expression of CYP26B1 gene and the risk of sarcopenia. The result of qPCR analysis is the same as the mRNA expression found in Gene Expression Omnibus(GEO) data set. Conclusion: This study identified a highly reliable six-gene diagnostic signature for sarcopenia. Mendelian randomization established CYP26B1 as the sole causal factor, linking retinoic acid metabolism to disease etiology. This dual evidence provides a robust six-gene diagnostic model and a prioritized therapeutic target, elucidating immune-metabolic mechanisms of sarcopenia. These findings offer new avenues for early diagnosis and metabolism-based precision therapy.

Keywords: bioinformatics, Diagnostic biomarker, machine learning, Mendelianrandomization, Sarcopenia

Received: 04 Dec 2025; Accepted: 09 Feb 2026.

Copyright: © 2026 Wu, Cai, Fan, Zhao, Jiao, Chen, Liu and Song. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Yafang Song

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.