AUTHOR=Haixian Liu , Shu Pang , Zhao Li , Chunfeng Lu , Lun Li TITLE=Machine learning approaches for EGFR mutation status prediction in NSCLC: an updated systematic review JOURNAL=Frontiers in Oncology VOLUME=Volume 15 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2025.1576461 DOI=10.3389/fonc.2025.1576461 ISSN=2234-943X ABSTRACT=BackgroundWith the rapid advances in artificial intelligence—particularly convolutional neural networks—researchers now exploit CT, PET/CT and other imaging modalities to predict epidermal growth factor receptor (EGFR) mutation status in non-small-cell lung cancer (NSCLC) non-invasively, rapidly and repeatably. End-to-end deep-learning models simultaneously perform feature extraction and classification, capturing not only traditional radiomic signatures such as tumour density and texture but also peri-tumoural micro-environmental cues, thereby offering a higher theoretical performance ceiling than hand-crafted radiomics coupled with classical machine learning. Nevertheless, the need for large, well-annotated datasets, the domain shifts introduced by heterogeneous scanning protocols and preprocessing pipelines, and the “black-box” nature of neural networks all hinder clinical adoption. To address fragmented evidence and scarce external validation, we conducted a systematic review to appraise the true performance of deep-learning and radiomics models for EGFR prediction and to identify barriers to clinical translation, thereby establishing a baseline for forthcoming multicentre prospective studies.MethodsFollowing PRISMA 2020, we searched PubMed, Web of Science and IEEE Xplore for studies published between 2018 and 2024. Fifty-nine original articles met the inclusion criteria. QUADAS-2 was applied to the eight studies that developed models using real-world clinical data, and details of external validation strategies and performance metrics were extracted systematically.ResultsThe pooled internal area under the curve (AUC) was 0.78 for radiomics–machine-learning models and 0.84 for deep-learning models. Only 17 studies (29%) reported independent external validation, where the mean AUC fell to 0.77, indicating a marked domain-shift effect. QUADAS-2 showed that 31% of studies had high risk of bias in at least one domain, most frequently in Index Test and Patient Selection.ConclusionAlthough deep-learning models achieved the best internal performance, their reliance on single-centre data, the paucity of external validation and limited code availability preclude their use as stand-alone clinical decision tools. Future work should involve multicentre prospective designs, federated learning, decision-curve analysis and open sharing of models and data to verify generalisability and facilitate clinical integration.