AUTHOR=Yang Wan-Xia , Wang Fang-Fang , Pan Yun-Yan , Xie Jian-Qin , Lu Ming-Hua , You Chong-Ge TITLE=Comparison of ischemic stroke diagnosis models based on machine learning JOURNAL=Frontiers in Neurology VOLUME=Volume 13 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/neurology/articles/10.3389/fneur.2022.1014346 DOI=10.3389/fneur.2022.1014346 ISSN=1664-2295 ABSTRACT=Background: The incidence, prevalence and mortality of ischemic stroke (IS) continue to rise, resulting in a serious global disease burden. The prediction models have a great value in early prediction and diagnosis of IS. Methods: The R software was used to screen the differentially expressed genes (DEGs) of IS and control samples in the datasets GSE16561, GSE58294, and GSE37587 and analyze DEGs for enrichment analysis. The feature genes of IS were obtained by several machine learning algorithms, including the Least Absolute Shrinkage and Selector Operation (LASSO) logistic regression, the Support Vector Machine-Recursive Feature Elimination (SVM-RFE) and the Random Forest (RF). The IS diagnostic models were constructed based on transcriptomics by machine learning and artificial neural network (ANN). Results: A total of 69 DEGs were identified, mainly involved in immune and inflammatory responses. The pathways enriched in the IS group were complement and coagulation cascades, lysosome, PPAR signaling pathway, regulation of autophagy, and toll like receptor signaling pathway. There are 17 feature genes being selected by LASSO, 10 by SVM-RFE, 12 by RF. The area under the curve (AUC) of LASSO model in training dataset, GSE22255, and GSE195442 was 0.969, 0.890, 1.000, the AUC of SVM-RFE model was 0.957, 0.805, 1.000, and the RF model was 0.947, 0.935, 1.000, respectively. The models have good sensitivity, specificity and accuracy. The AUC of LASSO+ANN, SVM-RFE+ANN, and RF+ANN models was 1.000, 0.995, and 0.997, in training dataset, respectively. However, the AUC of LASSO+ANN, SVM-RFE+ANN, and RF+ANN models was 0.688, 0.605, and 0.619, in GSE22255 dataset, respectively. The AUC of LASSO+ANN and RF+ANN models was 0.740 and 0.630, in GSE195442 dataset, respectively. In training dataset, the sensitivity, specificity, and accuracy of the LASSO+ANN model were 1.000, 1.000, and 1.000, SVM-RFE+ANN model were 0.946, 0.982, and 0.964, and RF+ANN model were 0.964, 1.000, and 0.982, respectively. In test datasets, the sensitivity was very satisfactory, however, the specificity and accuracy were not good. Conclusion: The LASSO, SVM-RFE, and RF models have good prediction ability. However, the ANN model is efficient at classifying positive samples and unsuitable in classifying negative samples.