AUTHOR=Yang Sha , Zeng Lingfeng , Jin Xin , Lin Huapeng , Song Jianning TITLE=Feature Genes in Neuroblastoma Distinguishing High-Risk and Non-high-Risk Neuroblastoma Patients: Development and Validation Combining Random Forest With Artificial Neural Network JOURNAL=Frontiers in Medicine VOLUME=Volume 9 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2022.882348 DOI=10.3389/fmed.2022.882348 ISSN=2296-858X ABSTRACT=There is a significant difference in prognosis among different risk groups. Therefore, it is of great significance to correctly identify the risk grouping of children. Using the genomic data of neuroblastoma samples in public databases, we used GSE49710 as the training set data to calculate the feature genes of the high-risk group and non-high-risk group samples based on random forest (RF) algorithm and artificial neural network (ANN) algorithm. The screening results of RF showed that EPS8L1, PLCD4, CHD5, NTRK1, and SLC22A4 were the feature differentially expressed genes (DEGs) of high-risk neuroblastoma. The prediction model based on gene expression datas in this study showed high overall accuracy and precision in both the training set and the test set (AUC = 0.998 in GSE49710 and AUC = 0.858 in GSE73517). Kaplan-meier plotter showed that the overall survival and progression-free survival of patients in low-risk subgroup significantly better than who in high-risk subgroup (HR: 3.86 (95% CI: 2.44-6.10) and HR: 3.03 (95% CI: 2.03-4.52), respectively]. Our ANN-based model has better classification performance than SVM-based model and XGboost-based model. Nevertheless, more convincing data sets and machine learning algorithms will be needed to build diagnostic models for individual organization types in the future.