AUTHOR=Yang Ruiyuan , Xiong Xingyu , Wang Haoyu , Li Weimin TITLE=Explainable Machine Learning Model to Prediction EGFR Mutation in Lung Cancer JOURNAL=Frontiers in Oncology VOLUME=Volume 12 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2022.924144 DOI=10.3389/fonc.2022.924144 ISSN=2234-943X ABSTRACT=Objectives: To determine whether the clinical features including blood markers can establish an explainable machine learning model to predict EGFR mutation in lung cancer. Methods: We retrospectively analyzed 7413 lung adenocarcinoma (LA) patients diagnosed by gene sequencing in West China Hospital of Sichuan University from April 2015 to June 2019. The Machine learning Algorithms (MLAs) included Logistic regression (LR), random forest (RF), LightGBM, Support Vector Machine (SVC), Multi-layer Perceptron (MLP), Extreme Gradient Boosting (XGboost), and Decision Tree (DT). Demographic characteristics, personal history and blood markers were taken into. The area under the receiver operating characteristic curve (AUC) and SHapley Additive exPlanation (SHAP) value were used to explain the prediction models. Results: 3527 of 7413 LA patients (47.6%) were identified with EGFR mutation, RF achieved greatest performance in predicting EGFR mutation AUC (0.771, 95% confidence interval (CI) : 0.770, 0.772), which was like Xgboost with AUC (0.740, 95% CI: 0.739, 0.741). The 5 most influential features were smoking consumption, sex, cholesterol, age, albumin globulin ratio and cholesterol. The SHAP summary and dependence plot have been used to explain the affection of the 12 features to this model and how a single feature influences the output, respectively. Conclusion: We established EGFR mutation prediction models by MLAs and revealed the RF was preferred, AUC (0.771, 95% CI: 0.770, 0.772), which was better than traditional models. Therefore, the artificial intelligence-based MLA predicting model may become a practical tool to guide in diagnosis and therapy of LA.