AUTHOR=Chen Qiu , Wang Yu , Liu Yongjun , Xi Bin TITLE=ESRRG, ATP4A, and ATP4B as Diagnostic Biomarkers for Gastric Cancer: A Bioinformatic Analysis Based on Machine Learning JOURNAL=Frontiers in Physiology VOLUME=Volume 13 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/physiology/articles/10.3389/fphys.2022.905523 DOI=10.3389/fphys.2022.905523 ISSN=1664-042X ABSTRACT=Based on multiple bioinformatics methods and machine-learning techniques, this study was designed to explore potential hub genes of gastric cancer with diagnostic value. The novel biomarkers were detected through multiple databases of gastric cancer-related genes. NCBI Gene Expression Omnibus (GEO) database was used to obtain gene expression files. Three hub genes (ESRRG, ATP4A, ATP4B) were detected through a combination of weighted gene co-expression network analysis (WGCNA), gene-gene interaction network analysis and supervised feature selection method. GEPIA 2 was used to verify the differences in expression levels of hub genes in normal and cancer tissues in the RNA-Seq levels of Genotype-Tissue Expression (GTEx) and The Cancer Genome Atlas (TCGA) databases. The objectivity of potential hub genes was also verified by immunohistochemistry in the Human protein atlas (HPA) database, Transcription Factor–hub Gene Regulatory Network. Machine learning (ML) methods including data pre-processing, model selection and cross-validation, and performance evaluation were examined on hub-gene expression profiles in five GEO databases and verified on an independent validation GEO dataset. Six supervised learning models (Support Vector Machine, Random Forest, K-nearest Neighbors, Neural Network, Decision Tree, eXtreme Gradient Boosting) and one semi-supervised learning model (Label Spreading) were established to evaluate the diagnostic value of biomarkers. Among the six supervised models, the Support Vector Machine (SVM) algorithm is the most effective one according to calculated performance metrics, including 0.93 and 0.95 area under the curve (AUC) scores on the test and valid datasets, respectively. Furthermore, the semi-supervised model could also successfully learn and predict sample types, achieving a 0.986 AUC score on the valid dataset, even when 10% samples in training datasets were labeled. In conclusion, three hub genes (ATP4A, ATP4B, ESRRG) closely related to gastric cancer were mined, based on which ML diagnostic model of gastric cancer were conducted.