AUTHOR=Wei Hangping , Zhang Xiaowei , Zhou Zhen , Xie Jianbin , Han Weidong , Dong Xiaofang TITLE=Hybrid model for predicting microsatellite instability in colorectal cancer using hematoxylin & eosin-stained images and clinical features JOURNAL=Frontiers in Oncology VOLUME=Volume 15 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2025.1580195 DOI=10.3389/fonc.2025.1580195 ISSN=2234-943X ABSTRACT=BackgroundMicrosatellite instability (MSI) is a crucial molecular phenotype in colorectal cancer (CRC), which aids in determining treatment strategies and predicting prognosis. However, existing prediction methods have limitations and are not universally applicable to all patient populations. Consequently, we proposed a hybrid prediction model that integrates pathological and clinical features to predict MSI.Materials and methodsThis study encompassed two patient cohorts: The Cancer Genome Atlas cohort (TCGA set, n = 559), which was divided into training and internal validation subsets at a ratio of 7:3, and the Dongyang CRC cohort (Dongyang set, n = 123), serving as an external testing cohort. Two deep learning approaches—semi-supervised and fully-supervised—were employed to extract features from pathological images. Subsequently, the pathomic signatures derived from these approaches were integrated with clinical features to develop a hybrid model. The hybrid model was assessed using an external validation cohort to determine the area under the curve (AUC). Furthermore, to investigate genes associated with MSI, we performed enrichment analysis and constructed a protein-protein interaction (PPI) network using mRNA sequencing data obtained from the TCGA database.ResultsThe fully-supervised pathological model demonstrated promising performance, achieving an AUC of 0.928 in the internal validation cohort, compared to the semi-supervised pathological model’s AUC of 0.786. In the external testing cohort, the model attained an AUC of 0.811. Subsequently, a hybrid model was established, which achieved an AUC of 0.949 in the validation cohort and a robust AUC of 0.862 in the test cohort. Additionally, a nomogram was developed to enhance its clinical applicability. Gene Ontology (GO) analysis identified differentially expressed genes (DEGs) related to MSI status, which were enriched in humoral immune response, among other pathways. Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Set Enrichment Analysis (GSEA) revealed enrichment in pathways such as rheumatoid arthritis. A PPI network identified key hub genes, including IFNG and CD8A.ConclusionThe fully-supervised model consistently outperformed the semi-supervised model in predicting MSI. Furthermore, the hybrid model, which combines pathological and clinical features, demonstrated strong predictive ability.