AUTHOR=Zhang Xintong , Wang Xiangyu , Wang Shuwei , Zhang Yingjie , Wang Zeyu , Yang Qingyan , Wang Song , Cao Risheng , Yu Binbin , Zheng Yu , Dang Yini TITLE=Machine learning algorithms assisted identification of post-stroke depression associated biological features JOURNAL=Frontiers in Neuroscience VOLUME=Volume 17 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/neuroscience/articles/10.3389/fnins.2023.1146620 DOI=10.3389/fnins.2023.1146620 ISSN=1662-453X ABSTRACT=Objectives Post-stroke depression (PSD) is a common psychiatric complication in post-stroke patients. Stroke is characterized by dynamic changes in metabolism and hemodynamics, however there is still a lack of metabolism-associated effective and reliable diagnostic markers and therapeutic targets for PSD. Our study was dedicated to the discovery of metabolism related diagnostic and therapeutic biomarkers for PSD. Methods Expression profiles of GSE140275, GSE122709, and GSE180470 were obtained from GEO database. DEGs were detected in GSE140275 and GSE122709. Functional enrichment analysis was performed for DEGs in GSE140275. WGCNA was constructed in GSE122709 to identify key module genes. Moreover, correlation analysis was performed to obtain metabolism related genes. Interaction analysis of key module genes, metabolism related genes, and DEGs in GSE122709 was performed to obtain candidate hub genes. Two machine learning algorithms, LASSO and random forest, were used to identify signature genes. Expression of signature genes was validated in GSE140275, GSE122709, and GSE180470. GSEA was applied on signature genes. Based on signature genes, a nomogram model was constructed in our PSD cohort. ROC curves were performed for the estimation of its diagnostic value. Finally, correlation analysis between expression of signature genes and several clinical traits was performed. Results Functional enrichment analysis indicated that DEGs in GSE140275 enriched in metabolism pathway. 8,188 metabolism associated genes were identified by correlation analysis. WGCNA analysis was constructed to obtain 3,471 key module genes. 557 candidate hub genes were identified by interaction analysis. Furthermore, SDHD and FERMT3 were selected using LASSO and random forest analysis. GSEA analysis found that two signature genes had major roles in depression. Subsequently, PSD cohort was collected for constructing a PSD diagnosis. Nomogram model showed good reliability and validity. ROC curves showed that two signature genes played a significant role in diagnosis of PSD. Conclusions 557 metabolism associated candidate hub genes were obtained by interaction with DEGs in GSE122709, key modules genes, and metabolism related genes. Based on machine learning algorithms, SDHD and FERMT3 were identified, they were proved to be valuable therapeutic and diagnostic biomarkers for PSD. Early diagnosis and prevention of PSD were made possible by our findings.