Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Oncol., 10 December 2025

Sec. Thoracic Oncology

Volume 15 - 2025 | https://doi.org/10.3389/fonc.2025.1655714

This article is part of the Research TopicRadiomics and Artificial Intelligence in Oncology ImagingView all 23 articles

Predicting Ki-67 expression levels in non-small cell lung cancer using an explainable CT-based deep learning radiomics model

Shize Qin&#x;Shize Qin1†Qing Jia&#x;Qing Jia2†Chunmei ZhangChunmei Zhang3Man LiMan Li4Xiufu ZhangXiufu Zhang1Xue ZhouXue Zhou1Dan SuDan Su1Yongying LiuYongying Liu1Jun Zhou*Jun Zhou1*
  • 1Department of Radiology, Jiangjin Central Hospital of Chongqing, Chongqing, China
  • 2Chongqing General Hospital, Chongqing University, Chongqing, China
  • 3Department of Pathology, Jiangjin Central Hospital of Chongqing, Chongqing, China
  • 4Department of Research and Development, Shanghai United Imaging Intelligence Co., Ltd., Shanghai, China

Objective: To predict Ki-67 expression levels in non-small cell lung cancer (NSCLC) using an interpretable model combining clinical-radiological, radiomic, and deep learning features.

Methods: This retrospective study included 259 NSCLC patients from Center 1 (training/validation sets) and 112 from Center 2 (independent test set). Patients were grouped by a 40% Ki-67 cutoff. Radiomic features and deep learning features were extracted from CT images, where the deep learning features were obtained via a deep residual network (ResNet18). The least absolute shrinkage and selection operator (LASSO) was used to select optimal features and compute radiomics (rad-score) and deep learning (deep-score) scores. Univariate and multivariate logistic regression were used to identify independent clinical-radiological predictors of Ki-67. Four support vector machine models were developed: a clinical-radiological model (based on independent clinical-radiological features), a radiomic model (using the rad-score), a deep learning model (using the deep-score), and a combined model (integrating all the above features). SHapley Additive exPlanations (SHAP) analysis was used to visualize feature contributions. Models’ performance was assessed using receiver operating characteristic (ROC) curves and the integrated discrimination improvement (IDI) index.

Results: High Ki-67 expression occurred in 76 (42.0%), 38 (48.7%), and 33 (29.5%) patients in the training, validation, and independent test sets, respectively. In the independent test set, the combined model achieved the highest predictive performance, with an AUC of 0.892 (95% CI: 0.828–0.956). This improvement over the clinical-radiological (0.820, 95% CI: 0.721–0.918), radiomics (0.750, 95% CI: 0.655–0.844), and deep learning (0.817, 95% CI: 0.732–0.902) models was statistically significant (all p<0.05), as supported by IDI values of 0.115, 0.288, and 0.095, respectively. SHAP analysis identified the deep-score, histological type, and rad-score as key predictors.

Conclusion: The interpretable combined model can predict Ki-67 expression in NSCLC patients. This approach may provide imaging evidence to assist clinicians in optimizing personalized therapeutic strategies.

1 Introductions

Non-small cell lung cancer (NSCLC) is the most common pathological type of lung cancer and is associated with a relatively low average five-year survival rate (14). This clinical reality highlights the urgent need for improved prognostic assessment. Therefore, Ki-67, a validated indicator of tumor cell proliferation, has gained significant attention (5). Its expression level is not only closely linked to the aggressive behavior of NSCLC (6, 7) but also serves as a critical prognostic determinant, with high Ki-67 expression being associated with markedly shorter progression-free and overall survival periods (8, 9). Furthermore, research has demonstrated the Ki-67 index may be clinically significant for predicting neoadjuvant chemotherapy effectiveness in stage I-IIIA NSCLC and chemotherapy responses in advanced NSCLC (10, 11). Currently, Ki-67 expression is typically assessed using immunohistochemistry (IHC); however, this method faces two key challenges: its invasive nature (12), and the inability of a localized sample to represent the entire tumor, given NSCLC’s significant heterogeneity (13, 14). Thus, a non-invasive method for characterizing tumor heterogeneity is needed for the accurate evaluation of Ki-67.

Radiomics addresses this by quantifying image-based features, which correlate with histopathology and show potential for Ki-67 prediction (15, 16). However, such predefined features are intrinsically limited in capturing the complex NSCLC microenvironment (17, 18). In contrast, deep learning can automatically extract high-level features directly from images, capturing complex information inaccessible to handcrafted radiomics (19, 20). The study further indicates that integrating radiomic and deep learning features can improve the classification of lung cancer subtypes and prognosis while achieving multi-modal information fusion (21, 22).

Therefore, this study aims to develop a model combining clinical-radiological, radiomic, and deep learning features to predict Ki-67 expression levels. We further use the Shapley Additive Explanations (SHAP) technique to interpret model outputs, with the goal of providing imaging evidence for personalized treatment planning.

2 Methods

2.1 Patient population

This retrospective study was conducted in accordance with ethical standards and was approved by the institutional review boards of [Jiangjin Central Hospital of Chongqing (Center 1)] (Approval No: KY20241204-001) and [Chongqing General Hospital (Center 2)] (Approval No: KY S2024-058-01), with a waiver of informed consent. It analyzed patient data from Center 1 and Center 2 between June 2022 and September 2024.

Inclusion criteria (1): Pathologically confirmed NSCLC by surgical resection or biopsy, with documented Ki-67 IHC results (2); Chest CT scans obtained before biopsy or surgery. Exclusion criteria (1): Patients with poor CT image quality or incomplete clinical/imaging data (2); Patients with unclear delineation between lesions and adjacent obstructive pneumonia or atelectatic tissue (3); Patients who received any treatment or any invasive examination before CT examination.

A total of 259 patients were enrolled from Center 1 (163 males, 96 females, mean age 65.4 ± 9.7 years), randomly divided into a training set (N = 181) and a validation set (N = 78) at a ratio of 7:3. At Center 2, an independent test set was established with 112 patients (60 males, 52 females, mean age 65.5 ± 9.6 years). The overall study design and analytical pipeline are summarized in Figure 1.

Figure 1
Flowchart illustrating a comprehensive model for analyzing medical data. It consists of seven main steps: 1) Data Collection, containing CT and clinical features. 2) Clinical-radiological feature extraction through univariate and multivariate analyses. 3) Radiomics feature extraction, using segmentation and feature selection for intensity, shape, and texture. 4) Deep-learning feature extraction, employing processed images through a ResNet model. 5) Combined features, integrating clinical-radiological features, Rad-Score, and Deep-Score. 6) Modeling with SVM. 7) Model evaluation shown through charts and plots. Each step is interconnected with directional arrows.

Figure 1. Technical roadmap.

2.2 Pathological assessment

IHC analysis was used to assess Ki-67 expression, defined as the percentage of positively stained cells. Under high magnification (*400), the number of Ki-67 positive tumor cells was quantified; Ki-67 expression was calculated as: (number of positive tumor cells in each zone/total number of tumor cells in each zone) ×100%. Values from five different areas were recorded and averaged. Previous studies have reported significant prognostic differences in NSCLC patients when a 40% cutoff value for Ki-67 expression is used (2326). Furthermore, several radiomics studies aiming to predict Ki-67 levels have also adopted this 40% threshold (27, 28). Thus, this study defined high Ki-67 expression as ≥40% and low expression as <40%.

2.3 CT examination

CT examination was performed using two scanners: a UCT530 40-row helical CT scanner (United Imaging Healthcare, Shanghai, China) at Center 1, and a Philips IQon Spectral 64-row helical CT scanner (Philips Medical Systems, Best, Netherlands) at Center 2. The scanning parameters were configured as follows: Center 1: Tube voltage of 120 kV, automatic tube current modulation (reference: 80 mAs), pitch factor of 1.075, rotation time of 0.6 s/rev, collimation width of 22 mm, acquisition matrix of 512×512, reconstruction slice thickness/interval of 1 mm, and lung window settings of window width 1200 HU/window center -600 HU; Center 2: Tube voltage of 120 kV, automatic tube current modulation (reference: 129 mAs), pitch factor of 1.000, rotation time of 0.5 s/rev, maintaining identical collimation width (22 mm), acquisition matrix (512×512), reconstruction parameters (1 mm thickness/interval), and lung window settings (window width 1200 HU/window center -600 HU).

2.4 Clinical data collection and CT radiological feature evaluation

Baseline clinical data of patients were systematically documented, encompassing demographic characteristics (gender, age, alcohol consumption, smoking history), tumor profiles (clinical stages, histological type), clinical symptoms (cough, sputum production, fever, breathing difficulty, chest tightness, headache, lymphadenopathy), and laboratory data (white blood cell count, lymphocyte count, platelet count, hemoglobin level, red blood cell count, creatinine level).

Two radiologists independently assessed the CT radiological features in a double-blind method, which were defined as expert-based descriptions of lesion morphology. The assessed features included tumor location, shape, maximum cross-sectional diameter, bronchial cutoff sign, lobulation, air bronchogram, spiculation, pleural tag, vacuole (or cystic component), and tumor density. Any disagreements were resolved by consensus.

2.5 Image segmentation

The uAI Research Portal (29) (version: 20240730, https://urp.united-imaging.com/; Shanghai United Imaging Intelligence Co., Ltd., Shanghai, China), a licensed commercial platform, was employed for image segmentation and all subsequent classification modeling. This one-stop analysis system for multimodal clinical research images was developed using Python (version 3.7.3) and incorporated the widely used PyRadiomics package. Access to the platform was granted under our institutional license agreement with the vendor. Using the platform’s integrated VB-Net model, regions of interest (ROIs) were automatically segmented and reconstructed into 3D volumes of interest (VOIs). This model, previously validated for pulmonary nodules [average Dice similarity coefficient (DSC): 0.915] (30). Additionally, a three-tier arbitration mechanism was established: two radiologists with 5 and 8 years of experience, respectively, independently performed slice-by-slice corrections of the ROIs. The inter-observer DSC was 0.887. For regions with high inter-observer agreement (DSC ≥ 0.700), the contours from the more experienced radiologist (8 years) were used. In cases of inter-observer discrepancies (DSC < 0.700), a senior radiologist with over 15 years of experience arbitrated and finalized the contours.

2.6 Feature extraction and selection

2.6.1 Radiomic feature extraction and selection

Before feature extraction, all CT images underwent a standardized preprocessing pipeline. This included isotropic resampling to a median voxel size of 0.7×0.7×1 mm³, gray-level discretization with a fixed bin width of 25, and setting the window width and level to 1500 HU and -600 HU, respectively. Radiomic feature extraction was performed on the segmented VOIs in compliance with the Image Biomarker Standardization Initiative (IBSI) guidelines. A total of 2264 features were extracted, which can be categorized into three groups: 104 shape-based features, 432 first-order statistical features, and 1728 second-order texture features.

Feature selection was exclusively conducted on the training set to avoid data leakage and overfitting, following a multi-step process (1): Z-score normalization of all features (2); removal of features with near-zero variance or high inter-correlation (Pearson’ r > 0.8) (3); selection of outcome-associated features via the SelectKBest method (F-statistic) (4); identification of the most robust, non-redundant feature set for modeling using the least absolute shrinkage and selection operator (LASSO) with 10-fold cross-validation.

The radiomics score (rad-score) was computed as a linear combination of the LASSO-selected features, weighted by their coefficients. The formula for the rad-score is as follows (Equation 1):

Radscore=in(Cofficienti×Featurei)+b(1)

where n is the number of selected features, Featurei and Cofficienti are the Z-score standardized value and its corresponding LASSO regression coefficient for the i-th feature, respectively, and b is the intercept of the model

2.6.2 Deep learning feature extraction and selection

A deep residual network (ResNet18) was implemented for feature extraction and trained from scratch, without using pre-trained weights or transfer learning. The original CT images were subjected to multi-stage preprocessing (see Supplementary Text 1 for specific preprocessing procedures). The preprocessed images were fed into ResNet18, optimized with a hybrid loss function combining Focal Loss and Dice Loss. Focal Loss served as the primary classification objective, while Dice Loss—computed between class activation maps (CAM) and ground-truth lesion regions—guided the network to focus on lesion areas, thereby improving both interpretability and classification accuracy. The total loss is defined as (Equation 2):

Lloss=Lfocal+αLdice (2)

where α is an empirically determined weighting coefficient (default α=0.05) that controls the relative contribution of the Dice loss.

The model was performed using the AdamW optimizer (β1=0.9, β2=0.999, weight decay=0.01) combined with a step-based learning rate scheduler (step size: 1000 epochs, decay factor: 0.1) for dynamic learning rate adjustment. Early stopping was employed to prevent overfitting. After training, the model with the highest validation AUC was selected as the final model for both end-to-end classification performance evaluation and deep feature extraction.

For deep feature extraction, the global average pooling layer was selected as the feature representation layer. Input CT images were preprocessed and fed into the finalized model (with frozen parameters), and the activations from the global average pooling layer were extracted as 256-dimensional feature vectors. Throughout this process, all model weights remained fixed without fine-tuning.

Deep learning feature selection comprised the following steps (1): Z-score normalization of all features (2); Application of the Relief algorithm to evaluate feature relevance (3); Final selection and weighting of the most discriminative features using LASSO regression.

The deep learning score (deep-score) was then computed as a linear combination of these selected features, weighted by their LASSO coefficients, following the formula (Equation 3):

Deepscore=in(Cofficienti×Featurei)+b(3)

where n is the number of selected features, Featurei and Cofficienti are the Z-score standardized value and its corresponding LASSO regression coefficient for the i-th feature, respectively, and b is the intercept of the model

2.7 Model development and validation

All modeling was conducted within the uAI Research Portal using the open-source Scikit-learn library (Scikit-learn 0.23.2). To ensure a fair comparison of predictive performance across feature modalities, clinical-radiological (based on independent clinical-radiological features), radiomics (using rad-score), and deep learning (using deep-score) models were developed with a uniform support vector machine (SVM) classifier. SVM is well-suited for handling high-dimensional data and potential non-linear relationships. For the combined model integrating clinical-radiological features, rad-score, and deep-score, the SVM classifier was compared against four other algorithms: Decision Tree, Random Forest, XGBoost, and Logistic Regression. The SVM demonstrated better performance on the validation set (Supplementary Table S1) and was consequently selected. All models were optimized for hyperparameters via grid search, with the specific parameters detailed in Supplementary Table S2.

Receiver operating characteristic (ROC) curves were plotted to calculate the area under the curve (AUC), specificity, sensitivity, and accuracy. Calibration curves, along with the Brier score, assessed the agreement between model predictions and pathologic outcomes. Decision curve analysis (DCA) was applied to quantify the clinical net benefit across a range of threshold probabilities from 0% to 100%.

2.8 Model interpretation

SHAP is an open-source python library (RPID: SCR_021362) for machine learning model interpretation. It quantitatively assesses the contribution of each feature to model outputs through Shapley value calculation. In this study, feature summary plot, beeswarm plot, decision plot and force plot were generated to provide global and local interpretations.

2.9 Statistical analysis

The Kolmogorov–Smirnov test was applied to assess data distribution normality. Normally distributed data were presented as mean ± standard deviation (x¯ ± s), whereas non-normally distributed data were reported as median with interquartile range [M (Q25, Q75)].

A univariate analysis was performed in the train set to assess each clinical and radiological feature’s association with the outcome. Continuous variables were compared using the student’s t-test or Mann-Whitney U test, while categorical variables were compared using the Chi-square test or Fisher’s exact test. Features with a p-value < 0.05 in the univariate analysis were subsequently included in the multivariate analysis. Multivariate analysis was conducted to identify clinical and radiological features independently associated with Ki-67 expression level. Model comparison between the combined model and the individual models was performed using the integrated discrimination improvement (IDI) index to quantify the magnitude of the net improvement in predicted probabilities. Statistical analyses were executed using the uAI Research Portal and SPSS statistical software (version 26.0, https://www.ibm.com). A p-value<0.05 was considered statistically significant.

3 Results

3.1 Clinical-radiological feature and clinical-radiological model

The clinical-radiological features of the three datasets are summarized in Table 1. IHC stratification showed high Ki-67 expression (≥40%) was present in 42.0% (76/181), 48.7% (38/78), and 29.5% (33/112) of cases in the training, validation, and independent test sets, respectively. Multivariate analyses identified histologic type (p<0.001), diameter (p=0.026), density (p=0.009) and gender (p=0.041) as independent clinical-radiological features predicting Ki-67 expression level (Table 2). The clinical-radiological model achieved the AUC values of 0.763 (training set), 0.803 (validation set), and 0.820 (independent test set), as detailed in Table 3.

Table 1
www.frontiersin.org

Table 1. Clinical-radiological features of the training, validation, and independent test sets.

Table 2
www.frontiersin.org

Table 2. Univariate and multivariate analysis results of clinical-radiological features.

Table 3
www.frontiersin.org

Table 3. Performance of each model for predicting ki-67 expression level.

3.2 Prediction performance of the radiomics model

Following feature selection, four robust radiomics features were retained to compute the rad-score (Supplementary Table S3). The high-expression group exhibited a significantly greater rad-score compared to the low-expression group across all three cohorts (training, validation, and independent test; all p<0.001), as shown in Figures 2A–C. The radiomics model achieved AUC values of 0.817 in the training set, 0.784 in the validation set, and 0.750 in the independent test set (Table 3).

Figure 2
Six box plots labeled A to F show Rad_Score and Dep_Score distributions. Each plot compares two groups, with significant differences indicated by Wilcoxon Rank Sum Test p-values less than 0.001. Each plot is titled and has outliers marked.

Figure 2. (A–F) show the rad-score and deep-score in training, validation, and independent test sets, respectively. Label 1 and Label 2 indicate the low- and high-expression groups.

3.3 Prediction performance of the deep learning model

The CAM module was used to visualize the attention regions that the deep learning network focused on when making classification decisions (Figure 3; a detailed explanation is provided in Supplementary Text 2). Four discriminative deep learning features were retained to compute the deep-score (Supplementary Table S4). The high-expression group exhibited a significantly greater deep-score compared to the low-expression group across all three cohorts (all p<0.001), as shown in Figures 2D–F. The AUCs of the training, validation, and independent test sets of the deep learning model were 0.912, 0.800, and 0.817, respectively (Table 3).

Figure 3
CT scans of lungs show lesions highlighted in red. Image A has a lesion magnified twice, first in detail, then with a heatmap overlay in red and yellow. Image B displays a similar pattern, with a magnified lesion shown first in a close-up view and then with a heatmap overlay.

Figure 3. Heatmaps of the deep learning network for two patients. (A) Patient 1; (B) Patient 2.

3.4 Prediction performance of the combined model

The combined model performance metrics revealed AUC values of 0.929 in the training set, 0.825 in the validation set, and 0.892 in the independent test set (Table 3, Figures 4A–C). In the independent test set, the combined model demonstrated significant improvements in overall predictive performance, with IDI values of 0.115, 0.288, and 0.095 over the clinical-radiological, radiomics, and deep learning models, respectively (Table 4). The DCA demonstrated that the combined model had a superior clinical net benefit than all the other models (Figures 4D–F). The calibration curves revealed that the combined model displayed strong agreement between predictions and pathology results (Figures 4G–I), with Brier scores of 0.101 (training set), 0187 (validation set) and 0.125 (independent test set), indicating excellent probabilistic calibration. Among all the features, deep-score and rad-score contributed the most to the prediction (Figures 5A–C).

Figure 4
Nine-panel graphic showing ROC, calibration, and decision curves for training, validation, and independent test sets. ROC curves show the diagnostic ability of models with AUC values, calibration curves depict predicted vs. actual probabilities with Brier scores, and decision curves illustrate net benefit across threshold probabilities. Each analysis compares clinical-radiological, radiomics, deep learning, and combined approaches, indicating their performance metrics.

Figure 4. Performance evaluation of the model across different datasets. ROC (A–C), calibration (D–F), and decision curve analysis (G–I) curves for the training, validation, and independent test sets, respectively.

Table 4
www.frontiersin.org

Table 4. Comparison of predictive performance among all models.

Figure 5
Three pie charts labeled A, B, and C show data set compositions. Chart A represents the Training set, with “DepScore(Lasso)” and “RadScore(Lasso)” each at thirty-two percent, followed by “Diameter” (sixteen point two percent), “Histological type” (nine point seven percent), “Density” (five point five percent), and “Gender” (four point seven percent). Chart B shows the Validation set, similar in distribution, with “DepScore(Lasso)” (thirty-two point two percent) and “RadScore(Lasso)” (thirty-two point two percent) being prominent. Chart C illustrates the Independent test set, with “DepScore(Lasso)” (twenty-eight point eight percent) and “RadScore(Lasso)” (twenty-eight point eight percent) as the largest categories, followed by “Diameter” (twenty-five point five percent) and others.

Figure 5. The relative contribution of features in the training (A), validation (B), and independent test (C) sets.

3.5 Model visualization and interpretation

For global feature analysis: the importance plot (Figure 6A) showed the relative significance ranking of features, with Deep-score being the most influential; the bee-swarm plot (Figure 6B) displayed SHAP values for each sample across features (positive values increased the probability of high Ki-67 expression, while negative values reduced it); the decision plot (Figure 6C) illustrated cumulative feature impacts on model predictions, with gray lines denoting baseline values, red trajectories mapping positive samples (high-expression group), and blue trajectories tracking negative samples (low-expression group).

Figure 6
Chart A shows the mean SHAP values for various features, with DepScore(Lasso) having the highest impact at 0.38. Chart B is a SHAP value scatter plot, indicating the impact and feature value color gradient. Chart C is a parallel coordinates plot showing model output values for the same features with a gradient from low to high impact.

Figure 6. Global visualization plots. (A–C) represent the feature importance plot, swarm plot, and decision plot, respectively.

For individual visualization, force plots (Figures 7A, B) illustrated the model’s decision-making process for both high- and low-expression cases. Score calculation started from the baseline value, with each feature represented as a directional force via SHAP values. Feature contributions were quantified by colored bars: bar length corresponded proportionally to the feature’s influence magnitude on the final prediction value f(x). Specifically, red bars indicated increased probability of high expression, while blue bars denoted decreased probability. These directional forces were then cumulatively aggregated to yield the overall effect.

Figure 7
Two bar charts labeled A and B depict contributions to a model's output value. In chart A, contributions include Gender (-0.813), RadScore(Lasso) (-1.353), DepScore(Lasso) (-1.272), Density (3.05), and Histological type (-0.626) with a model output of 0.50. In chart B, contributions include Gender (-0.813), Density (-0.512), DepScore(Lasso) (0.263), Histological type (1.597), and RadScore(Lasso) (0.571) with a model output of 0.73. Red sections indicate positive contributions and blue indicate negative contributions.

Figure 7. Force plots illustrating individual predictions. Shown are two representative samples from the low-expression group (A) and the high-expression group (B).

4 Discussion

In this study, we developed a combined model integrating clinical-radiological, radiomics, and deep learning features to non-invasively characterize intra-heterogeneity and predict Ki-67 expression level in NSCLC. Our findings demonstrate that the combined model achieved superior predictive performance compared to single models, maintaining robust generalizability with an AUC of 0.892 in the independent test set. SHAP analysis elucidated the combined model’s decision-making process, reinforcing its clinical credibility and generalizability. This study is anticipated to enable clinicians to preoperatively evaluate tumor invasiveness and optimize personalized therapeutic strategies.

NSCLC with different Ki-67 expression levels exhibits distinct biological behaviors and gene expression patterns. These visually imperceptible “differences” can be captured via medical imaging and quantified through high-throughput feature extraction, thereby clarifying the association between these features and the underlying pathophysiological processes (31, 32). Previous studies have explored the predictive value of CT-derived radiomics features for Ki-67 expression levels, with test set AUCs ranging from 0.77–0.84 (15, 27, 33). However, conventional radiomics features predominantly rely on mathematical formulations and are susceptible to technical variabilities, such as image noise, manual segmentation variability, and CT scan parameters (34). In contrast, deep learning algorithms excel at autonomously learning high-level abstract representations from images, effectively overcoming the limitations of handcrafted features. This advantage has driven their widespread use in recent medical research (20, 35). Building on these foundations, our study leverages the ResNet18 convolutional neural network to extract deep learning features from CT images for Ki-67 prediction. More importantly, we integrate diverse feature types through Machine Learning to achieve comprehensive multi-modal information fusion. The combined model demonstrated significantly enhanced predictive performance, underscoring the complementary value of multi-modal features. Liu et al. (36) fused radiomics and deep learning features from DCE-MRI images, and similarly found that the fusion improved the accuracy of preoperative lymph node metastasis prediction in breast cancer. Kim et al. (37) combined deep learning and radiomics features extracted from CT images to predict the epidermal growth factor receptor mutation status in NSCLC patients; results showed this combination of radiomics and deep learning was feasible.

In clinical practice, lung cancer treatment decisions should rely on robust evidence rather than algorithm-generated probabilities. This underscores the critical need to address interpretability challenges inherent in machine learning’s “black-box modeling.” Our study used SHAP technology to enhance the combined model’s interpretability, providing transparency into feature contributions and the model’s decision-making process. Results showed significantly higher deep-score and rad-score in the high-Ki-67 expression group than the low-expression group, consistent with previous research (27). Deep-score and rad-score are typically positively correlated with tumor heterogeneity (38). NSCLC with high Ki-67 expression exhibits stronger proliferative, infiltrative, and invasive abilities, resulting in more complex intra-tumoral texture and greater heterogeneity (9, 39). The high-Ki-67 expression group primarily included males, squamous cell carcinoma cases, and solid tumors with larger diameters, consistent with previous research (28, 40). Warth et al. (41) evaluated Ki-67 expression in three large independent NSCLC cohorts (total n=1,065) and found that squamous cell carcinoma had a mean expression level (52.8%) twice that of adenocarcinoma (25.8%). A meta-analysis further confirmed higher Ki-67 expression in squamous cell carcinoma than in adenocarcinoma (7). Additionally, more solid tumor components and larger diameters indicate more aggressive tumor biology, hence higher Ki-67 expression compared to non-solid tumors and smaller lesions (27, 42). Previous studies reported that CT radiological features (e.g., lobulation, vacuole or cystic component) showed significant differences across Ki-67 expression levels, but findings were inconsistent (15, 43). In the present study, no such differences were observed between the high- and low-expression groups. This discrepancy may stem from two key limitations: Firstly, physicians can only obtain limited valuable information from CT radiological features via visual inspection. Secondly, evaluation of these features often relies on physicians’ subjective judgment and clinical experience, leading to poor consistency and reproducibility that introduces bias into research findings.

There were some limitations to this study. First, this study has a retrospective design with an inherent risk of selection bias, and the sample size, though validated in an external set, remains limited. Therefore, the findings should be interpreted as proof-of-concept, and prospective validation in a larger, multi-center cohort is warranted before clinical application. Second, our analysis focused exclusively on intra-tumoral features, and the predictive value of peritumoral regions warrants future investigation. Third, data from enhanced CT were not included in this study. Because previous studies had suggested that high-density contrast in enhanced images may mask the original textural features of the lesion tissue (44). And enhanced CT scanning also carried the risk of iodine contrast use and renal burden. However, it was also noted that radiomics features based on biphasic enhanced CT images could predict Ki-67 expression level (27). Therefore, the inclusion of data from enhanced scans needs to be further explored. Finally, future work will aim to incorporate multi-modal data (e.g., pathomics, transcriptomics) to improve the biological interpretability of the models.

5 Conclusion

This study developed and validated an interpretable model combining clinical-radiological, radiomic, and deep learning features for the non-invasive prediction of Ki-67 expression levels. This model is expected to support clinicians in the early assessment of tumor proliferation activity in patients with NSCLC, providing complementary information to inform personalized treatment strategies.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The studies involving humans were approved by Medical Ethics Committee of The Jiangjin Central Hospital of Chongqing. The studies were conducted in accordance with the local legislation and institutional requirements. The ethics committee/institutional review board waived the requirement of written informed consent for participation from the participants or the participants’ legal guardians/next of kin because this is a retrospective study utilizing medical records obtained from previous clinical diagnoses and treatments.

Author contributions

SQ: Conceptualization, Data curation, Methodology, Project administration, Writing – original draft. QJ: Methodology, Software, Writing – original draft. CZ: Data curation, Writing – original draft. ML: Writing – review & editing, Data curation, Software. XFZ: Data curation, Investigation, Writing – original draft. XZ: Investigation, Writing – original draft. DS: Data curation, Investigation, Writing – original draft. YL: Data curation, Formal analysis, Investigation, Writing – original draft. JZ: Conceptualization, Funding acquisition, Supervision, Writing – review & editing.

Funding

The author(s) declared financial support was received for this work and/or its publication. This study was funded by the Bureau of Science and Technology of Jiangjin District, Chongqing Municipality (Grant No. Y2023012), and Beijing Medical Award Foundation (Grant No. YXJL-2025-0483-0265).

Conflict of interest

Author ML was employed by the company Shanghai United Imaging Intelligence Co., Ltd.

The remaining authors declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that Generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fonc.2025.1655714/full#supplementary-material

References

1. Bray F, Laversanne M, Sung H, Ferlay J, Siegel RL, Soerjomataram I, et al. Global cancer statistics 2022: globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. (2024) 74:229–63. doi: 10.3322/caac.21834

PubMed Abstract | Crossref Full Text | Google Scholar

2. Duma N, Santana-Davila R, and Molina JR. Non-small cell lung cancer: epidemiology, screening, diagnosis, and treatment. Mayo Clin Proc. (2019) 94:1623–40. doi: 10.1016/j.mayocp.2019.01.013

PubMed Abstract | Crossref Full Text | Google Scholar

3. Wao H, Mhaskar R, Kumar A, Miladinovic B, and Djulbegovic B. Survival of patients with non-small cell lung cancer without treatment: a systematic review and meta-analysis. Syst Rev. (2013) 2:10. doi: 10.1186/2046-4053-2-10

PubMed Abstract | Crossref Full Text | Google Scholar

4. Ganti AK, Klein AB, Cotarla I, Seal B, and Chou E. Update of incidence, prevalence, survival, and initial treatment in patients with non-small cell lung cancer in the us. JAMA Oncol. (2021) 7:1824–32. doi: 10.1001/jamaoncol.2021.4932

PubMed Abstract | Crossref Full Text | Google Scholar

5. Gerdes J, Lemke H, Baisch H, Wacker HH, Schwab U, and Stein H. Cell cycle analysis of a cell proliferation-associated human nuclear antigen defined by the monoclonal antibody ki-67. J Immunol. (1984) 133:1710–5. doi: 10.4049/jimmunol.133.4.1710

Crossref Full Text | Google Scholar

6. Spiliotaki M, Neophytou CM, Vogazianos P, Stylianou I, Gregoriou G, Constantinou AI, et al. Dynamic monitoring of pd-l1 and ki67 in circulating tumor cells of metastatic non-small cell lung cancer patients treated with pembrolizumab. Mol Oncol. (2023) 17:792–809. doi: 10.1002/1878-0261.13317

PubMed Abstract | Crossref Full Text | Google Scholar

7. Wei D, Chen W, Meng R, Zhao N, Zhang X, Liao D, et al. Augmented expression of ki-67 is correlated with clinicopathological characteristics and prognosis for lung cancer patients: an up-dated systematic review and meta-analysis with 108 studies and 14,732 patients. Respir Res. (2018) 19:150. doi: 10.1186/s12931-018-0843-7

PubMed Abstract | Crossref Full Text | Google Scholar

8. Zhao Y, Shi F, Zhou Q, Li Y, Wu J, Wang R, et al. Prognostic significance of pd-l1 in advanced non-small cell lung carcinoma. Med (Baltimore). (2020) 99:e23172. doi: 10.1097/MD.0000000000023172

PubMed Abstract | Crossref Full Text | Google Scholar

9. Tabata K, Tanaka T, Hayashi T, Hori T, Nunomura S, Yonezawa S, et al. Ki-67 is a strong prognostic marker of non-small cell lung cancer when tissue heterogeneity is considered. BMC Clin Pathol. (2014) 14:23. doi: 10.1186/1472-6890-14-23

PubMed Abstract | Crossref Full Text | Google Scholar

10. Hong X, Yang Z, Wang M, Wang L, and Xu Q. Reduced decorin expression in the tumor stroma correlates with tumor proliferation and predicts poor prognosis in patients with i-iiia non-small cell lung cancer. Tumour Biol. (2016) 37:16029–38. doi: 10.1007/s13277-016-5431-1

PubMed Abstract | Crossref Full Text | Google Scholar

11. Wang D, Chen D, Zhang C, Chai M, Guan M, Wang Z, et al. Analysis of the relationship between ki-67 expression and chemotherapy and prognosis in advanced non-small cell lung cancer. Transl Cancer Res. (2020) 9:3491–8. doi: 10.21037/tcr.2020.03.72

PubMed Abstract | Crossref Full Text | Google Scholar

12. Wiener RS, Schwartz LM, Woloshin S, and Welch HG. Population-based risk for complications after transthoracic needle lung biopsy of a pulmonary nodule: an analysis of discharge records. Ann Intern Med. (2011) 155:137–44. doi: 10.7326/0003-4819-155-3-201108020-00003

PubMed Abstract | Crossref Full Text | Google Scholar

13. Koo JM, Kim J, Lee J, Hwang S, Shim HS, Hong TH, et al. Deciphering the intratumoral histologic heterogeneity of lung adenocarcinoma using radiomics. Eur Radiol. (2025) 35:4861–72. doi: 10.1007/s00330-025-11397-4

PubMed Abstract | Crossref Full Text | Google Scholar

14. Li J, Qiu Z, Zhang C, Chen S, Wang M, Meng Q, et al. Ithscore: comprehensive quantification of intra-tumor heterogeneity in nsclc by multi-scale radiomic features. Eur Radiol. (2023) 33:893–903. doi: 10.1007/s00330-022-09055-0

PubMed Abstract | Crossref Full Text | Google Scholar

15. Bao J, Liu Y, Ping X, Zha X, Hu S, and Hu C. Preoperative ki-67 proliferation index prediction with a radiomics nomogram in stage t1a-b lung adenocarcinoma. Eur J Radiol. (2022) 155:110437. doi: 10.1016/j.ejrad.2022.110437

PubMed Abstract | Crossref Full Text | Google Scholar

16. Dong Y, Jiang Z, Li C, Dong S, Zhang S, Lv Y, et al. Development and validation of novel radiomics-based nomograms for the prediction of egfr mutations and ki-67 proliferation index in non-small cell lung cancer. Quant Imaging Med Surg. (2022) 12:2658–71. doi: 10.21037/qims-21-980

PubMed Abstract | Crossref Full Text | Google Scholar

17. Aerts HJ. The potential of radiomic-based phenotyping in precision medicine: a review. JAMA Oncol. (2016) 2:1636–42. doi: 10.1001/jamaoncol.2016.2631

PubMed Abstract | Crossref Full Text | Google Scholar

18. Manafi-Farid R, Askari E, Shiri I, Pirich C, Asadi M, Khateri M, et al. (18)f]fdg-pet/ct radiomics and artificial intelligence in lung cancer: technical aspects and potential clinical applications. Semin Nucl Med. (2022) 52:759–80. doi: 10.1053/j.semnuclmed.2022.04.004

PubMed Abstract | Crossref Full Text | Google Scholar

19. Wang J, Yang Y, Xie Z, Mao G, Gao C, Niu Z, et al. Predicting lymphovascular invasion in non-small cell lung cancer using deep convolutional neural networks on preoperative chest ct. Acad Radiol. (2024) 31:5237–47. doi: 10.1016/j.acra.2024.05.010

PubMed Abstract | Crossref Full Text | Google Scholar

20. Tao J, Liang C, Yin K, Fang J, Chen B, Wang Z, et al. 3d convolutional neural network model from contrast-enhanced ct to predict spread through air spaces in non-small cell lung cancer. Diagn Interv Imaging. (2022) 103:535–44. doi: 10.1016/j.diii.2022.06.002

PubMed Abstract | Crossref Full Text | Google Scholar

21. Zhang K, Zhao G, Liu Y, Huang Y, Long J, Li N, et al. Clinic, ct radiomics, and deep learning combined model for the prediction of invasive pulmonary aspergillosis. BMC Med Imaging. (2024) 24:264. doi: 10.1186/s12880-024-01442-x

PubMed Abstract | Crossref Full Text | Google Scholar

22. Chu X, Niu L, Yang X, He S, Li A, Chen L, et al. Radiomics and deep learning models to differentiate lung adenosquamous carcinoma: a multicenter trial. Iscience. (2023) 26:107634. doi: 10.1016/j.isci.2023.107634

PubMed Abstract | Crossref Full Text | Google Scholar

23. Ahn HK, Jung M, Ha S, Lee J, Park I, Kim YS, et al. Clinical significance of ki-67 and p53 expression in curatively resected non-small cell lung cancer. Tumor Biol. (2014) 35:5735–40. doi: 10.1007/s13277-014-1760-0

PubMed Abstract | Crossref Full Text | Google Scholar

24. Berghoff AS, Ilhan-Mutlu A, Wöhrer A, Hackl M, Widhalm G, Hainfellner JA, et al. Prognostic significance of ki67 proliferation index, hif1 alpha index and microvascular density in patients with non-small cell lung cancer brain metastases. Strahlenther Onkol. (2014) 190:676–85. doi: 10.1007/s00066-014-0639-8

PubMed Abstract | Crossref Full Text | Google Scholar

25. Vigouroux C, Casse JM, Battaglia-Hsu SF, Brochin L, Luc A, Paris C, et al. Methyl(r217) hur and mcm6 are inversely correlated and are prognostic markers in non small cell lung carcinoma. Lung Cancer. (2015) 89:189–96. doi: 10.1016/j.lungcan.2015.05.008

PubMed Abstract | Crossref Full Text | Google Scholar

26. Zhu WY, Hu XF, Fang KX, Kong QQ, Cui R, Li HF, et al. Prognostic value of mutant p53, ki-67, and ttf-1 and their correlation with egfr mutation in patients with non-small cell lung cancer. Histol Histopathol. (2019) 34:1269–78. doi: 10.14670/HH-18-124

PubMed Abstract | Crossref Full Text | Google Scholar

27. Sun H, Zhou P, Chen G, Dai Z, Song P, and Yao J. Radiomics nomogram for the prediction of ki-67 index in advanced non-small cell lung cancer based on dual-phase enhanced computed tomography. J Cancer Res Clin Oncol. (2023) 149:9301–15. doi: 10.1007/s00432-023-04856-2

PubMed Abstract | Crossref Full Text | Google Scholar

28. Fu Q, Liu SL, Hao DP, Hu YB, Liu XJ, Zhang Z, et al. Ct radiomics model for predicting the ki-67 index of lung cancer: an exploratory study. Front Oncol. (2021) 11:743490. doi: 10.3389/fonc.2021.743490

PubMed Abstract | Crossref Full Text | Google Scholar

29. Wu J, Xia Y, Wang X, Wei Y, Liu A, Innanje A, et al. Urp: an integrated research platform for one-stop analysis of medical images. Front Radiol. (2023) 3:1153784. doi: 10.3389/fradi.2023.1153784

PubMed Abstract | Crossref Full Text | Google Scholar

30. Chen L, Gu D, Chen Y, Shao Y, Cao X, Liu G, et al. An artificial-intelligence lung imaging analysis system (alias) for population-based nodule computing in ct scans. Comput Med Imaging Graph. (2021) 89:101899. doi: 10.1016/j.compmedimag.2021.101899

PubMed Abstract | Crossref Full Text | Google Scholar

31. Chen M, Copley SJ, Viola P, Lu H, and Aboagye EO. Radiomics and artificial intelligence for precision medicine in lung cancer treatment. Semin Cancer Biol. (2023) 93:97–113. doi: 10.1016/j.semcancer.2023.05.004

PubMed Abstract | Crossref Full Text | Google Scholar

32. Bi WL, Hosny A, Schabath MB, Giger ML, Birkbak NJ, Mehrtash A, et al. Artificial intelligence in cancer imaging: clinical challenges and applications. CA Cancer J Clin. (2019) 69:127–57. doi: 10.3322/caac.21552

PubMed Abstract | Crossref Full Text | Google Scholar

33. Yan J, Xue X, Gao C, Guo Y, Wu L, Zhou C, et al. Predicting the ki-67 proliferation index in pulmonary adenocarcinoma patients presenting with subsolid nodules: construction of a nomogram based on ct images. Quant Imaging Med Surg. (2022) 12:642–52. doi: 10.21037/qims-20-1385

PubMed Abstract | Crossref Full Text | Google Scholar

34. Zhao W, Yang J, Ni B, Bi D, Sun Y, Xu M, et al. Toward automatic prediction of egfr mutation status in pulmonary adenocarcinoma with 3d deep learning. Cancer Med. (2019) 8:3532–43. doi: 10.1002/cam4.2233

PubMed Abstract | Crossref Full Text | Google Scholar

35. Caii W, Wu X, Guo K, Chen Y, Shi Y, and Chen J. Integration of deep learning and habitat radiomics for predicting the response to immunotherapy in nsclc patients. Cancer Immunol Immunother. (2024) 73:153. doi: 10.1007/s00262-024-03724-3

PubMed Abstract | Crossref Full Text | Google Scholar

36. Liu W, Chen W, Xia J, Lu Z, Fu Y, Li Y, et al. Lymph node metastasis prediction and biological pathway associations underlying dce-mri deep learning radiomics in invasive breast cancer. BMC Med Imaging. (2024) 24:91. doi: 10.1186/s12880-024-01255-y

PubMed Abstract | Crossref Full Text | Google Scholar

37. Kim S, Lim JH, Kim CH, Roh J, You S, Choi JS, et al. Deep learning-radiomics integrated noninvasive detection of epidermal growth factor receptor mutations in non-small cell lung cancer patients. Sci Rep. (2024) 14:922. doi: 10.1038/s41598-024-51630-6

PubMed Abstract | Crossref Full Text | Google Scholar

38. Lucia F, Visvikis D, Desseroit MC, Miranda O, Malhaire JP, Robin P, et al. Prediction of outcome using pretreatment (18)f-fdg pet/ct and mri radiomics in locally advanced cervical cancer treated with chemoradiotherapy. Eur J Nucl Med Mol Imaging. (2018) 45:768–86. doi: 10.1007/s00259-017-3898-7

PubMed Abstract | Crossref Full Text | Google Scholar

39. Lambin P, Leijenaar R, Deist TM, Peerlings J, de Jong E, van Timmeren J, et al. Radiomics: the bridge between medical imaging and personalized medicine. Nat Rev Clin Oncol. (2017) 14:749–62. doi: 10.1038/nrclinonc.2017.141

PubMed Abstract | Crossref Full Text | Google Scholar

40. Liu F, Li Q, Xiang Z, Li X, Li F, Huang Y, et al. Ct radiomics model for predicting the ki-67 proliferation index of pure-solid non-small cell lung cancer: a multicenter study. Front Oncol. (2023) 13:1175010. doi: 10.3389/fonc.2023.1175010

PubMed Abstract | Crossref Full Text | Google Scholar

41. Warth A, Cortis J, Soltermann A, Meister M, Budczies J, Stenzinger A, et al. Tumour cell proliferation (ki-67) in non-small cell lung cancer: a critical reappraisal of its prognostic role. Br J Cancer. (2014) 111:1222–9. doi: 10.1038/bjc.2014.402

PubMed Abstract | Crossref Full Text | Google Scholar

42. Werynska B, Pula B, Muszczynska-Bernhard B, Piotrowska A, Jethon A, Podhorska-Okolow M, et al. Correlation between expression of metallothionein and expression of ki-67 and mcm-2 proliferation markers in non-small cell lung cancer. Anticancer Res. (2011) 31:2833–9.

PubMed Abstract | Google Scholar

43. Ma X, Zhou S, Huang L, Zhao P, Wang Y, Hu Q, et al. Assessment of relationships among clinicopathological characteristics, morphological computer tomography features, and tumor cell proliferation in stage i lung adenocarcinoma. J Thorac Dis. (2021) 13:2844–57. doi: 10.21037/jtd-21-7

PubMed Abstract | Crossref Full Text | Google Scholar

44. Yang X, He J, Wang J, Li W, Liu C, Gao D, et al. Ct-based radiomics signature for differentiating solitary granulomatous nodules from solid lung adenocarcinoma. Lung Cancer. (2018) 125:109–14. doi: 10.1016/j.lungcan.2018.09.013

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: radiomics, deep learning, interpretability, non-small cell lung cancer, Ki-67 expression levels

Citation: Qin S, Jia Q, Zhang C, Li M, Zhang X, Zhou X, Su D, Liu Y and Zhou J (2025) Predicting Ki-67 expression levels in non-small cell lung cancer using an explainable CT-based deep learning radiomics model. Front. Oncol. 15:1655714. doi: 10.3389/fonc.2025.1655714

Received: 28 June 2025; Accepted: 26 November 2025; Revised: 11 November 2025;
Published: 10 December 2025.

Edited by:

Morgan Michalet, Institut du Cancer de Montpellier (ICM), France

Reviewed by:

Xin Tang, Hangzhou Wuyunshan Hospital, China
Saveria Mazzara, Bocconi University, Italy

Copyright © 2025 Qin, Jia, Zhang, Li, Zhang, Zhou, Su, Liu and Zhou. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Jun Zhou, eHJheTE4OTVAMTYzLmNvbQ==

These authors have contributed equally to this work and share first authorship

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.