Ensemble learning for predicting microsatellite instability in colorectal cancer using pretreatment colonoscopy images and clinical data

You, Jia; Zhang, Shenghan; Zhang, Jianjie; Chen, Yaru; Zhang, Mengmeng; Zhou, Chungen; Jiang, Bin

doi:10.3389/fonc.2025.1734076

ORIGINAL RESEARCH article

Front. Oncol., 02 January 2026

Sec. Gastrointestinal Cancers: Colorectal Cancer

Volume 15 - 2025 | https://doi.org/10.3389/fonc.2025.1734076

Ensemble learning for predicting microsatellite instability in colorectal cancer using pretreatment colonoscopy images and clinical data

Jia You^1‡

Shenghan Zhang^2†‡

Jianjie Zhang¹

Yaru Chen¹

Mengmeng Zhang¹

Chungen Zhou¹

Bin Jiang^1*

¹Nanjing Hospital of Chinese Medicine Affiliated to Nanjing University of Chinese Medicine, Nanjing, Jiangsu, China
²Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States

Background: Microsatellite instability (MSI) is an important molecular biomarker in colorectal cancer (CRC), associated with favorable prognosis and response to immune checkpoint inhibitors. Conventional MSI testing, including immunohistochemistry (IHC) and polymerase chain reaction (PCR), is invasive, time-consuming, and resource-dependent, underscoring the need for non-invasive and automated alternatives. This study aimed to develop and evaluate an ensemble learning framework integrating pretreatment colonoscopy images and routine clinical data for non-invasive MSI prediction in CRC.

Methods: In this retrospective study, patients with pathologically confirmed CRC and IHC-determined MSI status were included. Pretreatment colonoscopy images and routine clinical variables were collected. Five deep learning architectures (ResNet-50, EfficientNet, DenseNet, VGG-16, and Vision Transformer) were trained on image data, while four machine learning algorithms (Logistic Regression, Random Forest, Support Vector Machine, and Gradient Boosting) were trained on clinical data. The best-performing models from each modality were combined using a majority-voting ensemble. Model performance was assessed using accuracy, precision, recall, and area under the receiver operating characteristic curve (AUROC). Interpretability was evaluated using Gradient-weighted Class Activation Mapping (Grad-CAM) for image models and SHapley Additive exPlanations (SHAP) for clinical models.

Results: Among 1,844 patients, VGG-16 achieved the best image-based performance (AUROC = 0.896, accuracy = 0.832, recall = 0.708). Logistic Regression outperformed other clinical models (AUROC = 0.898, accuracy = 0.825, recall = 0.828). The ensemble model integrating both modalities achieved AUROC = 0.886, precision = 0.920, and recall = 0.845, outperforming single-modality approaches.

Conclusion: The proposed ensemble learning framework provides a non-invasive, interpretable, and accurate method for MSI prediction, offering potential to improve preoperative precision diagnostics and clinical decision-making in colorectal cancer.

1 Introduction

Colorectal cancer (CRC) is the third most commonly diagnosed cancer and the second leading cause of cancer-related mortality worldwide, accounting for more than 900,000 deaths annually (1). Microsatellite instability (MSI), resulting from deficiency of the mismatch repair (MMR) system, is a key molecular subtype of CRC with critical clinical implications (2). MSI is associated with a more favorable prognosis in early-stage disease, particularly in stage II CRC (3). In addition, MSI tumors display marked responsiveness to immune checkpoint inhibitors, largely attributable to their high mutational burden and immunogenic microenvironment, making MSI status a crucial biomarker for guiding immunotherapy (4). Moreover, MSI serves as the molecular hallmark of Lynch syndrome, the most common hereditary colorectal cancer syndrome, and its detection is essential for identifying affected patients as well as at-risk family members (5). Consequently, MSI testing has become indispensable for guiding therapeutic decisions, predicting prognosis, and Lynch syndrome screening.

Current MSI testing primarily relies on immunohistochemistry (IHC) for MMR proteins and polymerase chain reaction (PCR)-based assays. Both approaches require tissue samples obtained through colonoscopy biopsy or surgical resection, which are inherently invasive and may lead to complications such as infection or bleeding. Tumor heterogeneity may also result in sampling bias, with MSI status underestimated or overestimated depending on the biopsy site (6). Furthermore, conventional testing methods often require several days to generate results and depend on specialized laboratory infrastructure, trained pathologists, and quality-controlled reagents, which are not universally available, particularly in resource-limited settings (7). These limitations underscore the urgent need for non-invasive, real-time, and cost-effective alternatives to support precision oncology.

Artificial intelligence (AI) has achieved rapid progress in medical image analysis (8, 9), thereby providing promising opportunities for MSI prediction. Pathology-based models applying deep learning to hematoxylin and eosin (H&E) slides have achieved high predictive accuracy by identifying morphological patterns associated with MSI (10, 11). Likewise, radiology-based approaches using CT and MRI have shown promise, either through end-to-end deep learning models applied directly to imaging data (12) or through radiomics workflows in which high-dimensional quantitative features are extracted and subsequently modeled using machine learning algorithms (13–15). However, pathology-based methods remain invasive, and radiology-based approaches often require labor-intensive manual tumor segmentation and rely on imaging features that may not fully capture biologically relevant tissue characteristics.

Colonoscopy offers a compelling alternative. It is routinely performed for CRC screening, localization, and treatment (16, 17), enabling the acquisition of high-quality, pretreatment images that reflect mucosal morphology, vascular patterns, and surface texture in real time. Compared with radiology, colonoscopy provides richer visual information at lower cost and without radiation exposure. Recent studies have demonstrated the feasibility of using colonoscopy images for MSI prediction (18, 19). For instance, Lo et al. (19) developed a Vision Transformer (ViT) model that achieved an AUC of 0.86 in MSI detection, while Cai et al. (18) trained a convolutional model achieving AUROCs of 0.948 (internal) and 0.807 (external). Despite these advances, most image-based approaches remain unimodal, relying solely on visual information, which may limit their robustness and interpretability.

Recent evidence suggests that integrating multiple data modalities can significantly enhance model performance. Multimodal AI frameworks have shown an average AUC improvement of approximately six percentage points over unimodal models across medical domains (20). For MSI prediction, combining clinical features with pathology (21) or radiology data (13, 14, 22) has been shown to improve accuracy. Within colonoscopy, Lo et al. (23) proposed a multimodal ViT model that concatenated colonoscopy image features with clinical data to predict colorectal cancer prognosis, achieving an AUC of 0.93 compared with 0.77 for colonoscopy images alone and 0.59 for clinical features alone. However, direct feature concatenation between heterogeneous data types may not optimally capture cross-modal relationships, emphasizing the need for more effective integration strategies.

Ensemble learning offers a practical and generalizable solution by combining outputs from multiple models to improve predictive stability and generalization. For instance, Cui et al. (24) applied a multimodal AI framework to the diagnosis of solid pancreatic lesions by integrating an endoscopic ultrasound imaging model with a clinical data model, achieving superior diagnostic accuracy compared with unimodal approaches. Likewise, recent work in bone tumor classification demonstrated that combining radiological imaging with clinical data outperformed image-only strategies (25). Building on these findings, integrating colonoscopy images with clinical data via ensemble learning may provide a non-invasive, interpretable, and clinically scalable approach for MSI prediction in CRC.

In this work, we developed and evaluated five deep learning architectures for colonoscopy image analysis and four machine learning classifiers for clinical data. Based on performance and balance across evaluation metrics, we selected representative models from each modality and integrated them using a majority-voting strategy to construct a multimodal ensemble framework. To enhance interpretability and facilitate clinical translation, we further applied Grad-CAM to visualize model attention in image-based predictions and SHAP to identify key feature contributions in the clinical models.

2 Materials and methods

2.1 Study design

We retrospectively identified patients with CRC treated at Nanjing Hospital of Chinese Medicine between 2019 and 2024. Eligible patients met the following inclusion criteria: (1) Pathologically confirmed CRC (2); Available MSI status determined by IHC. Patients were excluded if they had (1) received radiotherapy, chemotherapy, immunotherapy, or surgical resection prior to MSI testing or (2) synchronous colorectal tumors. MSI status was defined as loss of expression of at least one MMR protein, while preserved expression of all four proteins was classified as MSS.

Colonoscopy images were collected from the hospital’s picture archiving and communication system (PACS). These images were obtained from patients undergoing colonoscopy for cancer screening and tumor localization. Images were obtained using multiple endoscopy platforms, including Olympus Medical Systems (CF-H260AI, CF-H290I, CF-Q260AL, CF-H170I), Fujifilm Medical Systems (EC-530WI, EC-L590ZW, EC-530WM, EC-600WM, EC-760R-V/M, EC-760ZP-V/M), and Pentax Medical Systems (EC-34-i10F, EC-38i10F, EC-3890Fi, EC-3890FK, EC-3870FK). All images were exported in their original resolution (ranging from 764 × 504 to 2220 × 1230 pixels) in JPG or BMP format for subsequent analysis.

Routine clinical data were extracted from electronic medical records. Based on prior literature, clinical expertise, and practical considerations, 50 routine variables were initially selected (see Supplementary Table S1). Variables with more than 20% missing data, including D-dimer, FOBT, CRP, GFR, and HbA1c, were excluded (see Supplementary Figure S1). Consequently, 45 clinical variables were retained for model development.

In total, 1,855 patients met the inclusion criteria, including 116 MSI and 1,739 MSS cases. Pretreatment colonoscopy images were available for 1,224 patients (10,411 MSS images and 1,096 MSI images), who were included in the image-based analyses. The workflow is shown in Figure 1 abd the overall process of patient screening and cohort inclusion is summarized in Figure 2.

Figure 1

Flowchart detailing a medical data processing pipeline. It includes three main sections: data preprocessing, data splitting, and model development. Preprocessing involves clinical and image data desensitization, annotation, and augmentation. Data is split into training (1,485 samples) and test sets (370 samples) using five-fold cross-validation. The model development section combines machine learning with clinical data factors like gender, age, and tumor location, highlighting the use of feature extraction and classification to predict outcomes. The final output is a comprehensive prediction using ensemble methods.

Figure 1. Overview of the study workflow, including data preprocessing, model development, and ensemble integration.

Figure 2

Flowchart depicting patient selection for a colorectal cancer study. Out of 2,822 patients, 967 were excluded due to absence of MSI status (687), prior anti-cancer therapy (270), or synchronous tumors (10). 1,855 patients remained eligible, providing 11,507 colonoscopy images.

Figure 2. Flowchart of patient inclusion and exclusion.

The study protocol was approved by the Institutional Review Board (IRB) of the Nanjing Hospital of Chinese Medicine (Approval No. KY2024090), and the requirement for written informed consent was waived due to the retrospective nature of the study. All patient data were anonymized prior to analysis.

2.2 Data preprocessing

2.2.1 Colonoscopy images

Colonoscopy images went through an initial quality control step to exclude frames that were blurred, overexposed, narrow-band imaging (NBI)—based, or showed inadequate bowel preparation. To eliminate non-informative content, we automatically removed black borders and on-screen text using a pixel-intensity–based cropping algorithm. Each image was first converted to grayscale, and every row and column was scanned to identify the first and last positions where at least 20% of the pixels fell within a valid intensity range (5–250). This threshold reliably distinguished the circular endoscopic field from the surrounding dark margins. The bounding box defined by these boundaries was then applied to the original RGB image to produce a clean, content-focused crop. The resulting images were then resized to 224 × 224 pixels to match the deep learning model’s input requirements.

2.2.2 Clinical data

Outliers in continuous variables were identified using fences in the interquartile range (IQR) $(Q 1 - 1.5 \times I Q R, Q 3 + 1.5 \times I Q R)$ and compared to the nearest bound prior to imputation. Missing values were imputed using the Multivariate Imputation by Chained Equations (MICE) method. After imputation, categorical variables were transformed using one-hot encoding, and all variables were then standardized using z-score normalization.

2.2.3 Data splitting and augmentation

The dataset was split at the patient level into training (80%) and test (20%) sets with stratified sampling to preserve MSI/MSS ratios. The prevalence of MSI in our dataset was 6.29%, reflecting its relatively low frequency in colorectal cancer. Such class imbalance poses a significant challenge for modelling, as it may bias predictions toward the majority class (MSS) while underrepresenting the minority class (MSI). This imbalance can result in reduced sensitivity for MSI detection, impaired generalization, and misleading performance metrics. Addressing this issue is therefore critical to ensure that models achieve balanced performance across both classes.

To mitigate the impact of class imbalance and improve model robustness, data augmentation strategies were applied selectively to the minority class. For colonoscopy images, augmentation was performed exclusively on MSI samples during training to avoid further widening the gap between class sizes. The augmentation pipeline included random resized cropping, horizontal and vertical flipping, small rotations (± 15°>), color jittering, perspective distortion, Gaussian blur, and random erasing, thereby enhancing resilience to variability introduced by different imaging devices and acquisition conditions. For the clinical data, the Synthetic Minority Over-sampling Technique (SMOTE) was applied to the training set to generate synthetic MSI cases by interpolating between existing minority-class samples. This approach increased representation of MSI without duplicating data and preserved the underlying feature distribution, thus improving the model’s ability to recognize minority-class patterns.

2.3 Model development

2.3.1 Image-based model

For image-based MSI status prediction, we implemented a flexible deep learning framework that supports a range of backbone architectures, including ResNet, EfficientNet, ViT, DenseNet, and VGG. All models were initialized with ImageNet-pretrained weights, with the final classification layer modified to match the binary output space. Images were resized to 224×224 pixels and normalized by the ImageNet-specific preprocessing (26). Optimization was conducted using Adam, AdamW, or stochastic gradient descent (SGD) with momentum, with the default learning rate set to $1 \times 10^{- 3}$ .

To address the substantial class imbalance between MSI and MSS images, we initially evaluated several imbalance-aware loss functions, including focal loss (27), Tversky loss (28), focal Tversky loss (29), soft F_β loss (30), and soft precision–recall penalties (31). These losses enhance minority-class learning by down-weighting easy majority-class samples, penalizing false negatives more strongly, or directly optimizing recall-oriented objectives. Although these approaches resulted in moderate performance gains, particularly in recall, the most substantial improvements were observed after applying targeted data augmentation to increase the diversity of MSI samples. After augmentation, standard binary cross-entropy provided the most stable and robust performance; therefore, all final models in this study were trained using binary cross-entropy.

2.3.2 Clinical data–based machine learning model

For the tabular clinical features, we implemented a modular machine learning classification framework that supports four widely used algorithms: logistic regression (LR), support vector machine (SVM), random forest (RF), and gradient boosting classifier (GBC). All models were designed to output probabilistic predictions, thereby enabling downstream ensemble learning and calibration for clinical interpretability.

To address the class imbalance between MSI and MSS samples, we initially experimented with a range of class-weight configurations across all algorithms (for example, assigning higher penalties to the minority MSI class). However, empirical evaluation showed that class weighting did not improve model performance and, in some cases, slightly reduced it. We therefore adopted a data-level strategy and applied SMOTE to generate synthetic MSI samples during training. After SMOTE, unweighted versions of all classifiers yielded the most stable and robust performance; accordingly, all final clinical models in this study were trained using SMOTE-augmented data with unweighted class settings.

2.3.3 Training detail

Model training was conducted in two stages. First, stratified five-fold cross-validation was employed to identify the optimal key hyperparameters, including learning rate, number of training epochs, optimizer choice, and convergence settings, enhancing the balance between predictive performance and generalizability. All image-based and clinical models were trained independently during this stage, without any parameter sharing or joint optimization.

For image-based models, the final configuration adopted for training employed the stochastic gradient descent (SGD) optimizer (learning rate $= 1 \times 10^{- 3},$ batch size = 128), binary cross-entropy loss, and early stopping with a patience of three epochs based on validation loss. All image models were implemented in PyTorch (v2.7.1) and trained on an NVIDIA GeForce RTX 4070 Ti Super GPU.

For clinical tabular features, LR and SVM models were optimized with a stringent convergence tolerance ( $1 \times 10^{- 5}$ ), and the maximum number of iterations for LR was set to $1 \times 10^{6}$ to ensure convergence stability. RF and GBC classifiers were trained with 200 estimators to achieve a balance between computational efficiency and predictive accuracy.

The optimal hyperparameters identified through cross-validation were used to retrain each model on the full training set. All models were trained independently, with no joint optimization or shared fine-tuning, and the resulting classifiers were then integrated using an ensemble learning strategy to generate the final multimodal predictions.

2.4 Multimodal ensemble integration

For ensemble learning, model selection was guided by multiple performance metrics, including accuracy, AUC, precision, and recall (32). For image–based prediction, VGG16, ViT, EfficientNet, and ResNet50 were chosen, as each demonstrated strengths across different evaluation criteria (see Section 3.2 for detailed results). For clinical tabular features, LR, GBC, and RF emerged as the top-performing models (see Section 3.3 for detailed results).

The ensemble was constructed in a post-hoc manner: once all selected models were fully trained, their predicted probabilities were aggregated using a probability-based majority voting strategy. Unlike hard voting, this approach integrates the calibrated probability outputs of individual models, allowing more nuanced decision-making and reducing the risk of dominance by any single classifier. In addition to soft probability-based majority voting, we also implemented a stacking ensemble using a multi-layer perceptron (MLP) meta-classifier. Stacking results were evaluated but not adopted, as it consistently demonstrated substantially lower recall for the minority MSI class. Full stacking performance is reported in Supplementary Table S2.

3 Results

3.1 Study cohort characteristics

Among the 1,844 patients (MSI = 113; MSS = 1,731), MSI cases were significantly younger (median 60 vs. 66 years; p = 0.004) and more frequently located in the right colon (51.3% vs. 14.8%; p < 0.001). MSI patients had a lower prevalence of hypertension (32.7% vs. 44.6%; p = 0.014) and exhibited lower CEA (p < 0.001). Hematologic indices revealed lower HGB, MCV, pLYM, and cLYM, with higher RDW and pNEUT in MSI cases (all p < 0.001). In addition, MSI tumours were associated with lower bilirubin and lipid levels (e.g., LDL; p < 0.05). These findings are consistent with the distinct clinical and biological profile of MSI colorectal cancer (33). Baseline characteristics of the cohort are summarized in Table 1.

Table 1

Table 1. Baseline characteristics of the study cohort.

3.2 Image-based deep learning model evaluation

We trained and evaluated five deep learning architectures on colonoscopy images for MSI prediction: ResNet50, EfficientNet, DenseNet, VGG16, and ViT. As summarized in Table 2, all models demonstrated good discriminative ability, with AUROC values ranging from 0.873 to 0.896. VGG16 achieved the best overall balance, with the highest accuracy (0.832), precision of 0.943, recall of 0.708, and an AUROC of 0.894. ViT showed comparable performance, achieving the highest recall (0.721) and the best AUROC (0.896), although with slightly lower precision (0.911). ResNet50 and EfficientNet reached very high precision (0.955 and 0.963, respectively) but lower recall (∼ 0.68), indicating that their positive predictions were highly reliable but more conservative, potentially missing MSI cases. DenseNet also performed well (accuracy 0.818, AUROC 0.891, precision 0.941, recall 0.678), though it did not outperform VGG16 or ViT.

Table 2

Table 2. Performance of image-based deep learning classifiers for MSI prediction.

Receiver operating characteristic (ROC) and precision–recall (PR) curves for these models are provided in Figure 3, which further illustrate these trade-offs. Specifically, ResNet50 and EfficientNet emphasize conservative, high-precision predictions, while VGG16 and ViT demonstrate a more favorable balance between sensitivity and specificity. Collectively, these findings confirm that image-based deep learning models can effectively discriminate MSI from MSS, with VGG16 and ViT offering the most clinically relevant performance, and ResNet50 and EfficientNet contributing complementary high-precision predictors.

Figure 3

Two side-by-side graphs compare model performance. The left graph is an ROC curve showing the true positive rate versus false positive rate for ResNet, EfficientNet, DenseNet, VGG, and ViT. VGG performs best with an AUC of 0.8938. The right graph is a Precision-Recall curve showing precision versus recall for the same models, with ViT performing best with an AP of 0.9218. The gray dashed line on the ROC curve represents random performance.

Figure 3. Receiver operating characteristic (ROC) and precision–recall (PR) curves for the five image-based deep learning models. The ROC (left) and PR (right) curves illustrate the discriminative performance of five convolutional architectures (ResNet-50, EfficientNet, DenseNet, VGG-16, and Vision Transformer).

3.3 Clinical data–based machine learning model evaluation

We compared the performance of four machine learning classifiers trained on routine clinical variables: LR, SVM, RF, and GBC. As summarized in Table 3, LR achieved the most balanced performance with an accuracy of 0.825 and an AUROC of 0.898. Tree-based models (RF and GBC) and SVM demonstrated higher precision (≥ 0.93) but substantially lower recall (0.55–0.60), indicating that they identified fewer true MSI cases despite fewer false positives. In contrast, LR maintained a favorable trade-off between precision (0.823) and recall (0.828), suggesting superior sensitivity for MSI detection.

Table 3

Table 3. Performance of clinical data–based machine learning classifiers for MSI prediction.

ROC and PR curves for these classifiers are provided in Figure 4. These visualizations further illustrate the trade-offs between model sensitivity and specificity, showing that while RF, GBC, and SVM achieved strong discriminative capability, LR provided the most robust and clinically practical performance by balancing precision and recall across the decision threshold.

Figure 4

Side-by-side comparison of ROC and Precision-Recall curves for different models. The ROC curve plot includes models lr (AUC = 0.8982), svm (AUC = 0.9432), rf (AUC = 0.9449), gbc (AUC = 0.9392), with a dashed line for luck. The Precision-Recall curve plot shows models lr (AP = 0.8773), svm (AP = 0.9231), rf (AP = 0.9342), gbc (AP = 0.9355). Each model is represented with distinct colors.

Figure 4. Receiver operating characteristic (ROC) and precision–recall (PR) curves for the four clinical data–based machine learning models. The ROC (left) and PR (right) curves compare four classifiers trained on clinical variables: Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), and Gradient Boosting Classifier (GBC).

3.4 Comparative performance across modalities

To better understand the complementary contributions of colonoscopy image–based deep learning models and clinical data–based machine learning models, we conducted a comparative analysis across the two modalities.

The image-based models demonstrated relatively higher recall for MSI prediction. For example, VGG-16 and ViT achieved recalls of 0.708 and 0.721, respectively, reflecting the ability of deep networks to capture fine-grained morphological patterns such as mucosal irregularity, glandular disruption, and abnormal vascularity. These cues appear particularly informative for detecting MSI tumors. However, the AUROC values of image models (0.873–0.896) were modestly lower than those of the best clinical models, indicating greater variability in overall discrimination.

In contrast, the clinical machine learning models achieved higher AUROC but lower recall. Logistic Regression obtained the highest AUROC (0.898) and showed stable precision, highlighting the structured and biologically informative nature of clinical variables such as biochemical markers, hematologic indices, and patient demographics. However, these models were more conservative in predicting the MSI class, resulting in lower sensitivity. This pattern suggests that clinical variables better support global discrimination but are less effective at identifying minority-class MSI cases.

Taken together, these results demonstrate that each modality captures distinct and complementary aspects of MSI biology. Image-based models excel in sensitivity by detecting subtle morphological alterations, whereas clinical models provide stronger overall discrimination but miss more MSI cases. By integrating high-recall image predictors with high-AUROC clinical predictors, the ensemble achieves improved robustness and sensitivity compared with either modality alone.

3.5 Ensemble model evaluation

Using a majority voting strategy, we evaluated multiple combinations of clinical (LR, GBC, RF) and image-based (VGG16, ViT, ResNet50, EfficientNet, DenseNet) models (Table 4). Overall, ensembles consistently outperformed most single models in terms of balanced accuracy and recall. The best-performing ensembles included both clinical and image models, particularly LR + RF + ResNet + ViT + VGG + EfficientNet, which achieved the highest accuracy (0.886), recall (0.845), and AUROC (0.886) while maintaining high precision (0.920). Similar performance was observed when DenseNet was additionally included, suggesting that adding further redundant models did not provide incremental benefit. In contrast, smaller ensembles (e.g., LR + GBC + ViT + VGG) achieved lower recall (0.734), underscoring the importance of including diverse architectures.

Table 4

Table 4. Performance of ensemble models integrating clinical and image-based predictors for MSI prediction.

Collectively, these findings indicate that integrating both clinical and image-based models provides more robust and reliable predictions of MSI status than individual models, with majority voting effectively balancing sensitivity and specificity. For completeness, we also evaluated stacking ensembles using the same model combinations. However, stacking yielded lower performance, especially in recall (0.54–0.60), and therefore was not selected as the primary integration strategy. Detailed stacking results are provided in Supplementary Table S2.

3.6 Interpretability analysis

To enhance transparency and clinical relevance, interpretability analyses were conducted for both image and clinical models. For deep learning models, Grad-CAM was applied to visualize salient image regions most influential in MSI predictions, allowing qualitative assessment of whether the model attended to relevant mucosal and vascular patterns. For clinical machine learning models, SHAP were used to quantify the contribution of each variable to predictions. SHAP values provided both global feature importance rankings and local instance-level explanations.

For the image models, Grad-CAM visualization revealed that the networks primarily focused on tumor regions and surrounding mucosal structures when generating predictions. In MSS cases, highlighted areas tended to align with the tumor bulk and adjacent mucosa (Figure 5), whereas in MSI cases, attention maps often emphasized irregular lesion borders and heterogeneous mucosal patterns (Figure 6). These findings suggest that the models leveraged clinically plausible visual cues consistent with endoscopic examination, thereby supporting the biological interpretability of the predictions.

Figure 5

Three sets of images show colonoscopy results, gradcam heatmaps, and merged images predicting class 1 abnormalities. The left column displays original colon images with lesions, the middle shows corresponding gradcam maps highlighting areas in red and yellow, and the right column merges these, indicating predicted abnormal areas.

Figure 5. Representative Grad-CAM visualizations for MSS colonoscopy images. Each row shows the original image (left), the Grad-CAM activation map (middle), and the merged overlay (right).

Figure 6

Three rows of medical images. The left column shows original endoscopic images of a throat. The middle column displays corresponding Grad-CAM heatmaps, highlighting areas of interest in blue, yellow, and red. The right column merges the original images with Grad-CAM overlays to indicate predicted classifications.

Figure 6. Representative Grad-CAM visualizations for MSI colonoscopy images. Each row shows the original image (left), the Grad-CAM activation map (middle), and the merged overlay (right).

For the clinical models, SHAP analysis identified both demographic and clinical factors as key predictors of MSI. Height, gender, GLB, A/G ratio, weight, and tumor location emerged as the most influential features, with additional contributions from BMI, hypertension, and peripheral blood indices such as pNEUT and cLYM (Figure 7a). The beeswarm plots highlighted patient-level heterogeneity, with certain variables (e.g., tumor location and anthropometric measures) exerting consistent directional effects, whereas others showed more variable impacts (Figure 7b). At the individual patient level, waterfall plots demonstrated how combinations of features synergistically increased or decreased the likelihood of MSI prediction, providing a transparent rationale for model outputs (Figure 7c).

Figure 7

Three SHAP plots are displayed: Panel A is a bar chart showing the average impact of features on model output, with “Height” and “GenderMale” as top features. Panel B is a beeswarm plot displaying feature impact distribution, highlighting “Height” and “GenderMale” with varying feature values. Panel C is a SHAP explanation plot for one sample, illustrating how features like “Tumor_Location.Rectum” and “Weight” influence model prediction, with positive values in red and negative in blue.

Figure 7. SHAP-based interpretability analysis of the clinical model for MSI prediction. (A) Global feature importance ranked by mean absolute SHAP value. (B) SHAP beeswarm plot showing the distribution and direction of feature effects across all patients; red represents higher feature values and blue lower values. (C) Example of an individual patient’s SHAP explanation (waterfall plot), illustrating how specific feature values increased (blue) or decreased (red) the predicted probability of MSI.

Together, the interpretability analyses confirm that both clinical and image-based models captured meaningful features, enhancing trust in the ensemble framework by linking predictive signals to clinically relevant patterns.

4 Discussion

In this study, we developed and evaluated a multimodal ensemble model integrating colonoscopy images and clinical data for MSI prediction in colorectal cancer. Both image and clinical-based models showed strong discriminative performance, with VGG16 excelling among image models, and LR performing best among clinical models. Importantly, a majority-voting ensemble combining image and clinical data achieved better performance than single models.

Prior work has shown that MSI can be predicted from pathology and radiology images, including H&E whole-slide models such as WiseMSI (34) and CT/MRI radiomics approaches (35, 36). While these imaging modalities have demonstrated strong performance, they typically require tissue sampling, specialized scanners, or labor-intensive tumor segmentation, which may limit scalability in routine clinical practice. In contrast, colonoscopy is widely available, non-invasive, and performed before treatment in nearly all patients, making it a practical platform for real-time MSI risk stratification.

Colonoscopy has been explored as a promising modality for MSI prediction in colorectal cancer. Lo et al. (19) applied a Vision Transformer to data from 441 patients (34 MSI, 407 MSS), achieving an AUC of 0.86, with a sensitivity of 0.47 and a specificity of 0.94. Cai et al. (18) subsequently developed MMR-Scopy, a ResNet50-based model trained on 5,226 colonoscopy images, which achieved an AUROC of 0.948 in the internal test set and 0.807 in external validation, with a sensitivity of 0.796 and a specificity of 0.670. In contrast to these unimodal approaches, our study leveraged a substantially larger cohort of 1,844 patients (113 MSI, 1,731 MSS) and 11,507 colonoscopy images, providing greater statistical power and model robustness. Furthermore, we extended the framework beyond image-only prediction to multimodal integration. By combining image-based deep learning with clinical data–based machine learning, our ensemble model outperformed individual modalities, achieving higher precision (0.920) and recall (0.845) and thereby improving the identification of MSI cases.

Although some single models (such as ViT and LR) achieved slightly higher AUROC scores than the ensemble, this difference is partly attributable to class imbalance. AUROC is relatively insensitive to minority-class performance (37), and conservative models such as LR or models that capture strong morphological cues such as ViT may appear to perform better under AUROC even though they miss a greater number of MSI cases. In contrast, the ensemble consistently achieved substantially higher recall, reflecting its ability to integrate complementary strengths from both image-based and clinical models. This trade-off is expected because the ensemble focuses on balanced performance rather than maximizing a single discrimination metric. Recall is especially important in MSI screening, because missed MSI cases can delay immunotherapy eligibility or reduce the likelihood of identifying patients with Lynch syndrome. Therefore, the ensemble’s improved sensitivity is more aligned with real-world clinical priorities.

We also evaluated more complex integration strategies, including stacking with an MLP meta-classifier. However, stacking consistently resulted in substantially lower recall. This likely occurs because the meta-model inherits the class imbalance present in the training data, which biases predictions toward the majority MSS class. In contrast, probability-based majority voting introduces no additional trainable parameters and therefore avoids amplifying imbalance. Its transparent combination of model probabilities further enhances interpretability in clinical settings. For completeness, the stacking results are reported in the Supplementary Table S2.

The strong performance of VGG-16 among image-based models and LR among clinical models can be explained by the interplay between model architecture and data characteristics. VGG16 performs well on colonoscopy images because the visual patterns relevant to MSI, including coarse mucosal textures, vascular irregularities, and tumor-surface morphology, are effectively captured by stacked 3×3 convolutions without the need for deep residual blocks. This architectural simplicity helps reduce overfitting and supports stable training on a dataset of moderate size. In contrast, the structured clinical variables exhibit largely linear or monotonic relationships with MSI status (38), which makes LR particularly appropriate. Tree-based models and SVM tended to overfit the minority MSI class or required more extensive hyperparameter tuning. For these reasons, the observed performance reflects the compatibility between each model and the underlying data rather than differences in model complexity.

A critical barrier to clinical adoption of AI models is their interpretability. We therefore conducted model explainability analyses using Grad-CAM for the image-based models and SHAP for the clinical models. The Grad-CAM visualizations of the image-based models showed that the networks attended to clinically relevant tumor regions and mucosal abnormalities in colonoscopy images, rather than being driven by irrelevant background structures. For MSI tumors, Grad-CAM maps highlighted irregular vascular patterns and mucosal disruptions. In MSS tumors, attention was more diffusely distributed but still focused on tumor mass regions. Together, these interpretability analyses not only enhance clinician trust in model predictions but also provide insight into potential endoscopic correlates of MSI biology. Histopathologically, MSI tumors are characterized by heterogeneous glandular, mucinous, and solid components, along with increased microvascular density (39). These features likely translate endoscopically into tumors with more prominent mucosal secretions and irregular, enlarged vascular patterns, which were consistently emphasized by the Grad-CAM outputs.

Our SHAP analysis revealed key determinants underlying MSI prediction. Tumor location was the most influential factor, with rectal tumors contributing negatively, consistent with the predominance of MSI in right-sided colon cancers (40). Anthropometric features such as height, weight, and BMI were also important, aligning with evidence that obesity is more strongly linked to MSS colorectal cancer (41). Gender showed a moderate effect, supporting findings that MSI-H tumors are more frequent in men, while estrogen may exert a protective role in women (42). Immune and hematologic indices (pNEUT, cLYM) reflected the immune-rich and inflammatory microenvironment of MSI tumors (43), whereas liver function arkers (ALB, GLB, A/G ratio) suggested potential metabolic associations. Although hypertension also contributed, its biological relevance remains uncertain. Together, these findings confirm that the model captured clinically plausible and biologically meaningful predictors, reinforcing its interpretability and translational potential.

Our results have several important clinical implications. MSI is a critical biomarker in CRC, with relevance for both prognosis and therapy selection, particularly response to immune checkpoint inhibitors. Conventional MSI testing relies on IHC or PCR, which are invasive, time-consuming, and costly. By leveraging colonoscopy images and routine clinical data, our approach offers a non-invasive, rapid, and cost-effective alternative for MSI pre-screening. In practice, such a system could be deployed at the time of colonoscopy, providing immediate stratification and guiding subsequent confirmatory testing. For example, patients predicted as MSS with high confidence could bypass unnecessary molecular testing, reducing diagnostic burden and cost, while MSI-positive predictions could be prioritized for confirmatory IHC or PCR. This workflow has the potential to accelerate treatment decision-making, improve resource allocation, and reduce the workload of pathologists and laboratory personnel.

Several limitations should be acknowledged. First, This retrospective single-center design may introduce selection bias and limit the generalizability of our findings. External validation using independent multi-center datasets is essential to assess generalizability and clinical applicability. Second, although the low MSI prevalence reflects real-world epidemiology, it may reduce sensitivity for minority-class detection despite augmentation and ensemble strategies. This may be because some patients in our hospital did not undergo MSI testing due to cost or other factors. Nonetheless, this further strengthens the potential of our model to pre-screen MSI status and guide decisions regarding the necessity of IHC or PCR testing. Finally, although the ensemble improves predictive balance, it introduces additional computational overhead that may affect real-time deployment during colonoscopy. Future work will explore lightweight architectures, model distillation, or on-device optimization to improve efficiency. Notably, our soft probability–based voting module itself requires minimal computation, which helps mitigate deployment challenges.

In conclusion, we demonstrate that combining colonoscopy image–based deep learning and clinical machine learning models through ensemble learning enables accurate, interpretable, and non-invasive MSI prediction. Grad-CAM and SHAP analyses enhance transparency and clinical trust by linking predictions to biologically meaningful patterns. This ensemble framework holds promise as a practical adjunct to molecular assays, with potential to streamline diagnostics, reduce testing burden, and support personalized treatment strategies in colorectal cancer.

Data availability statement

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Ethics statement

The studies involving humans were approved by Ethics Committee of Nanjing Hospital of Chinese Medicine. The studies were conducted in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required from the participants or the participants’ legal guardians/next of kin in accordance with the national legislation and institutional requirements.

Author contributions

JY: Data curation, Writing – original draft, Methodology, Funding acquisition, Investigation, Writing – review & editing, Formal analysis, Conceptualization. SZ: Visualization, Data curation, Methodology, Formal analysis, Writing – review & editing, Software, Writing – original draft. JZ: Investigation, Writing – original draft. YC: Writing – original draft, Investigation. MZ: Investigation, Writing – original draft. CZ: Conceptualization, Writing – review & editing, Funding acquisition. BJ: Funding acquisition, Supervision, Conceptualization, Writing – review & editing.

Funding

The author(s) declared that financial support was received for work and/or its publication. 1. General Program of Basic Research of Jiangsu Provincial Science and Technology Department (Grant No. BK20241749); 2. Nanjing Municipal Science and Technology Bureau (Grant No. YKK21218); 3. 2023 Basic Research Project of Nanjing Hospital of Chinese Medicine Affiliated to Nanjing University of Chinese Medicine (Grant No. YJJC202301); 5. Postgraduate Research & Practice Innovation Program of Jiangsu Province (Grant No. SJCX25_0889).

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fonc.2025.1734076/full#supplementary-material

References

1. Bray F, Laversanne M, Sung H, Ferlay J, Siegel RL, Soerjomataram I, et al. Global cancer statistics 2022: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: Cancer J Clin. (2024) 74:229–63. doi: 10.3322/caac.21834

PubMed Abstract | Crossref Full Text | Google Scholar

2. Amato M, Franco R, Facchini G, Addeo R, Ciardiello F, Berretta M, et al. Microsatellite instability: from the implementation of the detection to a prognostic and predictive role in cancers. Int J Mol Sci. (2022) 23:8726. doi: 10.3390/ijms23158726

PubMed Abstract | Crossref Full Text | Google Scholar

3. Petrelli F, Ghidini M, Cabiddu M, Pezzica E, Corti D, Turati L, et al. Microsatellite instability and survival in stage ii colorectal cancer: a systematic review and meta-analysis. Anticancer Res. (2019) 39:6431–41. doi: 10.21873/anticanres.13857

PubMed Abstract | Crossref Full Text | Google Scholar

4. Lenz HJ, Van Cutsem E, Luisa Limon M, Wong KYM, Hendlisz A, Aglietta M, et al. First-line nivolumab plus low-dose ipilimumab for microsatellite instability-high/mismatch repair-deficient metastatic colorectal cancer: the phase ii checkmate 142 study. J Clin Oncol. (2022) 40:161–70. doi: 10.1200/JCO.21.01015

PubMed Abstract | Crossref Full Text | Google Scholar

5. Dabir PD, Bruggeling CE, van der Post RS, Dutilh BE, Hoogerbrugge N, Ligtenberg MJ, et al. Microsatellite instability screening in colorectal adenomas to detect lynch syndrome patients? a systematic review and meta-analysis. Eur J Hum Genet. (2020) 28:277–86. doi: 10.1038/s41431-019-0538-7

PubMed Abstract | Crossref Full Text | Google Scholar

6. Lebedeva A, Taraskina A, Grigoreva T, Belova E, Kuznetsova O, Ivanilova D, et al. The role of msi testing methodology and its heterogeneity in predicting colorectal cancer immunotherapy response. Int J Mol Sci. (2025) 26:3420. doi: 10.3390/ijms26073420

PubMed Abstract | Crossref Full Text | Google Scholar

7. Yakushina V, Kavun A, Veselovsky E, Grigoreva T, Belova E, Lebedeva A, et al. Microsatellite instability detection: the current standards, limitations, and misinterpretations. JCO Precis Oncol. (2023) 7:e2300010. doi: 10.1200/PO.23.00010

PubMed Abstract | Crossref Full Text | Google Scholar

8. Yin Y, Zhang R, Liu P, Deng W, Hu D, He S, et al. Artificial neural networks for finger vein recognition: a survey. Eng Appl Artif Intell. (2025) 150:110586. doi: 10.1016/j.engappai.2025.110586

Crossref Full Text | Google Scholar

9. Jiang H, Yin Y, Zhang J, Deng W, and Li C. Deep learning for liver cancer histopathology image analysis: A comprehensive survey. Eng Appl Artif Intell. (2024) 133:108436. doi: 10.1016/j.engappai.2024.108436

Crossref Full Text | Google Scholar

10. Gerwert K, Schörner S, Großerueschkamp F, Kraeft AL, Schuhmacher D, Sternemann C, et al. Fast and label-free automated detection of microsatellite status in early colon cancer using artificial intelligence integrated infrared imaging. Eur J Cancer. (2023) 182:122–31. doi: 10.1016/j.ejca.2022.12.026

PubMed Abstract | Crossref Full Text | Google Scholar

11. Wagner SJ, Reisenbüchler D, West NP, Niehues JM, Zhu J, Foersch S, et al. Transformer-based biomarker prediction from colorectal cancer histology: A large-scale multicentric study. Cancer Cell. (2023) 41:1650–61. doi: 10.1016/j.ccell.2023.08.002

PubMed Abstract | Crossref Full Text | Google Scholar

12. Chen W, Zheng K, Yuan W, Jia Z, Wu Y, Duan X, et al. A ct-based deep learning for segmenting tumors and predicting microsatellite instability in patients with colorectal cancers: a multicenter cohort study. La radiologia Med. (2025) 130:214–25. doi: 10.1007/s11547-024-01909-5

PubMed Abstract | Crossref Full Text | Google Scholar

13. Chen S, Du W, Cao Y, Kong J, Wang X, Wang Y, et al. Preoperative contrast-enhanced ct imaging and clinicopathological characteristics analysis of mismatch repair-deficient colorectal cancer. Cancer Imaging. (2023) 23:97. doi: 10.1186/s40644-023-00591-6

PubMed Abstract | Crossref Full Text | Google Scholar

14. Li Z, Zhong Q, Zhang L, Wang M, Xiao W, Cui F, et al. Computed tomography-based radiomics model to preoperatively predict microsatellite instability status in colorectal cancer: A multicenter study. Front Oncol. (2021) 11:666786. doi: 10.3389/fonc.2021.666786

PubMed Abstract | Crossref Full Text | Google Scholar

15. Zhang W, Huang Z, Zhao J, He D, Li M, Yin H, et al. Development and validation of magnetic resonance imaging-based radiomics models for preoperative prediction of microsatellite instability in rectal cancer. Ann Trans Med. (2021) 9:134. doi: 10.21037/atm-20-7673

PubMed Abstract | Crossref Full Text | Google Scholar

16. Sung JJ, Chiu HM, Lieberman D, Kuipers EJ, Rutter MD, Macrae F, et al. Third asia-pacific consensus recommendations on colorectal cancer screening and postpolypectomy surveillance. Gut. (2022) 71:2152–66. doi: 10.1136/gutjnl-2022-327377

PubMed Abstract | Crossref Full Text | Google Scholar

17. Davidson KW, Barry MJ, Mangione CM, Cabana M, Caughey AB, Davis EM, et al. Screening for colorectal cancer: Us preventive services task force recommendation statement. Jama. (2021) 325:1965–77. doi: 10.1001/jama.2021.6238

PubMed Abstract | Crossref Full Text | Google Scholar

18. Cai Y, Chen X, Chen J, Liao J, Han M, Lin D, et al. Deep learning-assisted colonoscopy images for prediction of mismatch repair deficiency in colorectal cancer. Surg Endoscopy. (2025) 39:859–67. doi: 10.1007/s00464-024-11426-1

PubMed Abstract | Crossref Full Text | Google Scholar

19. Lo CM, Jiang JK, and Lin CC. Detecting microsatellite instability in colorectal cancer using transformer based colonoscopy image classification and retrieval. PLoS One. (2024) 19:e0292277. doi: 10.1371/journal.pone.0292277

PubMed Abstract | Crossref Full Text | Google Scholar

20. Schouten D, Nicoletti G, Dille B, Chia C, Vendittelli P, Schuurmans M, et al. Navigating the landscape of multimodal ai in medicine: a scoping review on technical challenges and clinical applications. Med Image Anal. (2025) 105:103621. doi: 10.1016/j.media.2025.103621

PubMed Abstract | Crossref Full Text | Google Scholar

21. Hezi H, Gelber M, Balabanov A, Maruvka YE, and Freiman M. Cimil-crc: A clinically-informed multiple instance learning framework for patient-level colorectal cancer molecular subtypes classification from h&e stained images. Comput Methods Programs Biomedicine. (2025) 259:108513. doi: 10.1016/j.cmpb.2024.108513

PubMed Abstract | Crossref Full Text | Google Scholar

22. Li Z, Zhang J, Zhong Q, Feng Z, Shi Y, Xu L, et al. Development and external validation of a multiparametric mri-based radiomics model for preoperative prediction of microsatellite instability status in rectal cancer: a retrospective multicenter study. Eur Radiol. (2023) 33:1835–43. doi: 10.1007/s00330-022-09160-0

PubMed Abstract | Crossref Full Text | Google Scholar

23. Lo CM, Yang YW, Lin JK, Lin TC, Chen WS, Yang SH, et al. Modeling the survival of colorectal cancer patients based on colonoscopic features in a feature ensemble vision transformer. Computerized Med Imaging Graphics. (2023) 166:102242. doi: 10.1016/j.compmedimag.2023.102242

PubMed Abstract | Crossref Full Text | Google Scholar

24. Cui H, Zhao Y, Xiong S, Feng Y, Li P, Lv Y, et al. Diagnosing solid lesions in the pancreas with multimodal artificial intelligence: a randomized crossover trial. JAMA Network Open. (2024) 7:e2422454–e2422454. doi: 10.1001/jamanetworkopen.2024.22454

PubMed Abstract | Crossref Full Text | Google Scholar

25. Wang H, He Y, Wan L, Li C, Li Z, Li Z, et al. Deep learning models in classifying primary bone tumors and bone infections based on radiographs. NPJ Precis Oncol. (2025) 9:72. doi: 10.1038/s41698-025-00855-3

PubMed Abstract | Crossref Full Text | Google Scholar

26. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, et al. Imagenet large scale visual recognition challenge. Int J Comput Vision. (2015) 115:211–52. doi: 10.1007/s11263-015-0816-y

Crossref Full Text | Google Scholar

27. Lin TY, Goyal P, Girshick R, He K, and Dollár P. (2017). Focal loss for dense object detection, in: Proceedings of the IEEE international conference on computer vision. Venice, Italy: IEEE. pp. 2980–8.

Google Scholar

28. Salehi SSM, Erdogmus D, and Gholipour A. Tversky loss function for image segmentation using 3d fully convolutional deep networks. In: International workshop on machine learning in medical imaging. Cham: Springer (2017). p. 379–87.

Google Scholar

29. Abraham N and Khan NM. A novel focal tversky loss function with improved attention u-net for lesion segmentation. In: 2019 IEEE 16th international symposium on biomedical imaging (ISBI 2019). Venice, Italy: IEEE (2019). p. 683–7.

Google Scholar

30. Lee N, Yang H, and Yoo H. A surrogate loss function for optimization of f_β score in binary classification with imbalanced data. arXiv preprint arXiv:2104.01459. (2021) abs/2104.01459.

Google Scholar

31. Fränti P and Mariescu-Istodor R. Soft precision and recall. Pattern Recognition Lett. (2023) 167:115–21. doi: 10.1016/j.patrec.2023.02.005

Crossref Full Text | Google Scholar

32. Mursil M, Rashwan HA, Khalid A, Cavallé-Busquets P, Santos-Calderon L, Murphy MM, et al. Interpretable deep neural networks for advancing early neonatal birth weight prediction using multimodal maternal factors. J Biomed Inf. (2025), 104838. doi: 10.1016/j.jbi.2025.104838

PubMed Abstract | Crossref Full Text | Google Scholar

33. Mao J, He Y, Chu J, Hu B, Yao Y, Yan Q, et al. Analysis of clinical characteristics of mismatch repair status in colorectal cancer: a multicenter retrospective study. Int J Colorectal Dis. (2024) 39:100. doi: 10.1007/s00384-024-04674-z

PubMed Abstract | Crossref Full Text | Google Scholar

34. Chang X, Wang J, Zhang G, Yang M, Xi Y, Xi C, et al. Predicting colorectal cancer microsatellite instability with a self-attention-enabled convolutional neural network. Cell Rep Med. (2023) 4. doi: 10.1016/j.xcrm.2022.100914

PubMed Abstract | Crossref Full Text | Google Scholar

35. Bodalal Z, Hong EK, Trebeschi S, Kurilova I, Landolfi F, Bogveradze N, et al. Non-invasive ct radiomic biomarkers predict microsatellite stability status in colorectal cancer: a multicenter validation study. Eur Radiol Exp. (2024) 8:98. doi: 10.1186/s41747-024-00484-8

PubMed Abstract | Crossref Full Text | Google Scholar

36. Wang Y, Xie B, Wang K, Zou W, Liu A, Xue Z, et al. Multi-parametric mri habitat radiomics based on interpretable machine learning for preoperative assessment of microsatellite instability in rectal cancer. Acad Radiol. (2025) 32:3975–3988. doi: 10.1016/j.acra.2025.02.009

PubMed Abstract | Crossref Full Text | Google Scholar

37. McDermott M, Zhang H, Hansen L, Angelotti G, and Gallifant J. A closer look at auroc and auprc under class imbalance. Adv Neural Inf Process Syst. (2024) 37:44102–63. doi: 10.52202/079017-1400

Crossref Full Text | Google Scholar

38. Pei Q, Yi X, Chen C, Pang P, Fu Y, Lei G, et al. Pre-treatment ct-based radiomics nomogram for predicting microsatellite instability status in colorectal cancer. Eur Radiol. (2022) 32:714–24. doi: 10.1007/s00330-021-08167-3

PubMed Abstract | Crossref Full Text | Google Scholar

39. Ying M, Pan J, Lu G, Zhou S, Fu J, Wang Q, et al. Development and validation of a radiomics-based nomogram for the preoperative prediction of microsatellite instability in colorectal cancer. BMC Cancer. (2022) 22:524. doi: 10.1186/s12885-022-09584-3

PubMed Abstract | Crossref Full Text | Google Scholar

40. Song Y, Wang L, Ran W, Li G, Xiao Y, Wang X, et al. Effect of tumor location on clinicopathological and molecular markers in colorectal cancer in eastern China patients: an analysis of 2,356 cases. Front Genet. (2020) 11:96. doi: 10.3389/fgene.2020.00096

PubMed Abstract | Crossref Full Text | Google Scholar

41. Hoffmeister M, Bläker H, Kloor M, Roth W, Toth C, Herpel E, et al. Body mass index and microsatellite instability in colorectal cancer: a population-based study. Cancer epidemiology Biomarkers Prev. (2013) 22:2303–11. doi: 10.1158/1055-9965.EPI-13-0239

PubMed Abstract | Crossref Full Text | Google Scholar

42. Jin P, Lu XJ, Sheng JQ, Fu L, Meng XM, Wang X, et al. Estrogen stimulates the expression of mismatch repair gene hmlh1 in colonic epithelial cells. Cancer Prev Res. (2010) 3:910–6. doi: 10.1158/1940-6207.CAPR-09-0228

PubMed Abstract | Crossref Full Text | Google Scholar

43. Sui Q, Zhang X, Chen C, Tang J, Yu J, Li W, et al. Inflammation promotes resistance to immune checkpoint inhibitors in high microsatellite instability colorectal cancer. Nat Commun. (2022) 13:7316. doi: 10.1038/s41467-022-35096-6

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: artificial intelligence, colonoscopy, colorectal cancer, deep learning, diagnositic model, ensemble learning, machine learning, microsatellite instability (MSI)

Citation: You J, Zhang S, Zhang J, Chen Y, Zhang M, Zhou C and Jiang B (2026) Ensemble learning for predicting microsatellite instability in colorectal cancer using pretreatment colonoscopy images and clinical data. Front. Oncol. 15:1734076. doi: 10.3389/fonc.2025.1734076

Received: 28 October 2025; Accepted: 04 December 2025; Revised: 29 November 2025;
Published: 02 January 2026.

Edited by:

Gavino Faa, University of Cagliari, Italy

Reviewed by:

Jinghua Zhang, Hohai University, China
Zhicheng Du, Tsinghua University, China

Copyright © 2026 You, Zhang, Zhang, Chen, Zhang, Zhou and Jiang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Bin Jiang, amJmaXJzdGhAYWxpeXVuLmNvbQ==

^†Present address: Shenghan Zhang, Department of Technology, PharMolix Inc., Shanghai, China

^‡These authors contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.