CT-based radiomics for predicting Ki-67 expression in lung cancer: a systematic review and meta-analysis

Background Radiomics, an emerging field, presents a promising avenue for the accurate prediction of biomarkers in different solid cancers. Lung cancer remains a significant global health challenge, contributing substantially to cancer-related mortality. Accurate assessment of Ki-67, a marker reflecting cellular proliferation, is crucial for evaluating tumor aggressiveness and treatment responsiveness, particularly in non-small cell lung cancer (NSCLC). Methods A systematic review and meta-analysis conducted following the preferred reporting items for systematic review and meta-analysis of diagnostic test accuracy studies (PRISMA-DTA) guidelines. Two authors independently conducted a literature search until September 23, 2023, in PubMed, Embase, and Web of Science. The focus was on identifying radiomics studies that predict Ki-67 expression in lung cancer. We evaluated quality using both Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) and the Radiomics Quality Score (RQS) tools. For statistical analysis in the meta-analysis, we used STATA 14.2 to assess sensitivity, specificity, heterogeneity, and diagnostic values. Results Ten retrospective studies were pooled in the meta-analysis. The findings demonstrated that the use of computed tomography (CT) scan-based radiomics for predicting Ki-67 expression in lung cancer exhibited encouraging diagnostic performance. Pooled sensitivity, specificity, and area under the curve (AUC) in training cohorts were 0.78, 0.81, and 0.85, respectively. In validation cohorts, these values were 0.78, 0.70, and 0.81. Quality assessment using QUADAS-2 and RQS indicated generally acceptable study quality. Heterogeneity in training cohorts, attributed to factors like contrast-enhanced CT scans and specific Ki-67 thresholds, was observed. Notably, publication bias was detected in the training cohort, indicating that positive results are more likely to be published than non-significant or negative results. Thus, journals are encouraged to publish negative results as well. Conclusion In summary, CT-based radiomics exhibit promise in predicting Ki-67 expression in lung cancer. While the results suggest potential clinical utility, additional research efforts should concentrate on enhancing diagnostic accuracy. This could pave the way for the integration of radiomics methods as a less invasive alternative to current procedures like biopsy and surgery in the assessment of Ki-67 expression.


Introduction
Lung cancer is a major global health challenge, leading in cancer-related deaths and posing a significant threat to public health.Despite notable progress in diagnosis and therapy, it remains a persistent global health burden (1,2).Non-small cell lung cancer (NSCLC) takes precedence, constituting 85% of total cases and involving adenocarcinoma and squamous cell carcinoma (3)(4)(5).Disturbingly, more than two-thirds of NSCLC instances receive a diagnosis at an advanced stage (6,7).Therefore, early diagnosis of this cancer is very crucial for its management.
Ki-67, a marker reflecting cellular proliferation, provides crucial information about the tumor's aggressiveness and its potential responsiveness to treatment (8).Its prediction is crucial in lung cancer due to its role as a proliferation marker.Its significance lies in assessing tumor cell proliferation, aiding prognostic evaluations and treatment decisions in NSCLC (9).Ki-67 has emerged as a prognostic marker associated with overall survival (OS) and disease-free survival (DFS) in NSCLC.Higher Ki-67 expression indicates poorer outcomes, suggesting its potential to predict disease aggressiveness and guide personalized treatment approaches (10,11).
In the realm of lung cancer, the significance of imaging has been revitalized, particularly in the context of baseline staging and response assessment.The emergence of cutting-edge technologies like artificial intelligence (AI) has further elevated the role of imaging, transforming it into a potent biomarker for noninvasive tumor characterization (12,13).This resurgence underscores the potential of advanced imaging methods, which are empowered by computational advancements, to provide comprehensive insights into Ki-67 levels.Such noninvasive approaches promise to enhance our understanding of tumor characteristics, obviating the necessity for invasive procedures and opening new avenues for precise diagnostic and prognostic assessments (14).
Radiomics is an emerging field within medical imaging that aims to extract extensive quantitative data from routine medical images, such as those obtained from CT, MRI, and positron emission tomography (PET) (15,16).The process typically involves identifying and segmenting a region of interest (ROI), which can be done manually or using automated algorithms (17).From these segmented regions, high-dimensional features are extracted, falling into two main categories: semantic features, which describe morphological aspects of lesions, and agnostic features, which are mathematical (18,19).Functioning in diverse capacities such as tumor classification, survival prediction, and therapy response assessment, radiomic signatures are pivotal in crafting imaging biomarkers for personalized therapy (12,20).The interdisciplinary realm of radiogenomics seamlessly intertwines imaging with genomics and molecular data.Despite grappling with methodological challenges, radiomics persistently holds promise, offering nuanced insights beyond the confines of traditional cancer evaluation methods (21).
A meta-analysis on predicting EGFR mutation in NSCLC revealed that AI-based algorithms, utilizing radiomics features, serve as valuable and noninvasive tools for predicting EGFR mutation status, with excellent diagnostic accuracy (22).Recently, many meta-analyses on radiomics-based methods have been published to investigate the overall diagnostic performance of the available studies in the field.This will help in obtaining a standpoint regarding the current radiomics methods for predicting biomarkers in cancers (23-25).Thus, this study aims to provide a meta-analysis of the radiomics studies for predicting Ki-67 expression in lung cancer for the first time and evaluate their quality as well.

Materials and methods
This systematic review and meta-analysis was conducted according to the preferred reporting items for systematic review and meta-analysis of diagnostic test accuracy studies guidelines (PRISMA-DTA) (26).No review protocol was registered.

Literature search
Two authors independently conducted a thorough literature search of the PubMed, Embase, and Web of Science databases to find papers that used radiomics for Ki-67 prediction in lung cancer and that were published up until September 23, 2023.Following terms were used in search: (Ki-67) AND (Lung Cancer) AND (Radiomics).The retrieved references were exported to the Mendeley Reference Manager.The detailed search method is shown in Supplementary Table S1.

Eligibility criteria
The inclusion criteria were as follows based on the PICO questions (population, intervention, comparison, and outcomes): (P) patients with lung cancer, (I) radiomics methods were applied to identify Ki-67 expression in lung cancer, (C) diagnosis was made by histopathological examination (preferably via surgery), and (O) providing sufficient data for constructing 2×2 table including true positive (TP), false positive (FP), false negative (FN), and true negative (TN) values for evaluating sensitivity and specificity.The exclusion criteria were as follows: (a) article published not in the English language, (b) case reports, reviews, letters, meetings, abstracts, comments, and guidelines, (c) articles with insufficient data for constructing 2×2 tables (d) articles without radiomic analysis, (e) cohort overlaps, and (f) studies that expression of Ki-67 was not predicted.The primary outcome was the prediction of Ki-67 using radiomics by providing sensitivity and specificity.Secondary outcome measures included area under the curve (AUC), diagnostic odds ratio (DOR), and positive and negative likelihood ratios (PLR, NLR).After the titles and abstracts were examined by two different reviewers, the entire texts were evaluated to see if they qualified for inclusion.If there were disagreements amongst the reviewers, they were resolved by discussion or, if required, consultation with a third reviewer.

Data extraction
The basic data of the included studies were extracted using a data extraction table.The data that was extracted included: first author name, publication year, study design (retrospective vs. prospective), country, imaging modality (e.g., CT or PET), population (case and controls), age of patients, cut-off for Ki-67 in immunohistochemistry (IHC) staining, number of extracted features, ROI structure (3D vs. 2D), number of features (selected/ extracted), name of the software for feature extraction, type of radiomics features, feature reduction algorithm, and algorithm for model construction.For meta-analysis, these data were extracted as well: TP, FN, TN, and FP.Upon evaluating the diagnostic efficacy of multiple algorithms on an identical sample, the algorithm yielding the most favorable categorization outcomes was selected.

Quality assessment
Two tools, including QUADAS-2 and the RQS scoring system, were used for quality assessment.QUADAS-2 is a tool used for assessing the quality of diagnostic accuracy studies in systematic reviews.QUADAS-2 provides a structured framework for evaluating the risk of bias and concerns regarding the applicability of diagnostic accuracy studies.It focuses on four key domains: patient selection, index test, reference standard, and flow and timing.The tool is widely used in evidence-based medicine to ensure rigorous evaluation of the quality of diagnostic studies included in systematic reviews (27).The RQS is another system for measuring the quality of radiomics studies with sixteen components with a maximum point of 36 points (28).Two independent reviewers conducted the quality assessment, and any disagreements were resolved by discussion.

Statistical analysis
The meta-analysis was carried out in STATA 14.2 using "midas" module.A coupled forest plot was generated to depict the pooled sensitivity and specificity of the radiomics studies.Cochran's Q and Higgins' I 2 were computed to assess the heterogeneity among the studies included in this meta-analysis.I 2 values ranging from 0 to 25%, 25 to 50%, 50 to 75%, and > 75% indicate very low, low, medium, and high heterogeneity, respectively.Pooling studies and effect size were evaluated using a random-effects model, emphasizing the consideration of heterogeneity when estimating the distribution of true effects across studies.The hierarchical summary receiver operating characteristic (HSROC) model was used to produce the summary receiver operating characteristic (SROC) curve and estimate the pooled AUC.Other diagnostic values, including DOR, PLR, and NLR were pooled.Spearman's rank correlation test was used to investigate the threshold effect.Meta-regression was used to investigate the possible source of heterogeneity based on different subgrouping.When publication bias was present, following excluding each study one by one, sensitivity analysis was conducted to evaluate the stability of the pooled values.Leaveone-out analysis was performed using OpenMeta[Analyst] software.A Deek's funnel plot was generated to show publication bias.All p-values lesser than 0.05 were considered significant.

Literature search
As per the search methodology outlined in the methods section, a total of 1,532 were identified from different databases.Following the removal of 105 duplicate records, 1,427 titles were subjected to evaluation.During the title and abstract assessment, 1,380 citations were excluded due to not meeting inclusion criteria (e.g., lack of relevance based on title/abstract, inclusion of meeting reports, reviews, case reports, and not being in the English language).After careful revision, an additional 37 articles were excluded: 21 studies were found to not involve radiomics, 11 did not predict Ki-67 expression, and 5 did not provide sufficient data for constructing a 2×2 table.This resulted in the final inclusion of 10 articles for the meta-analysis (29-38).The study flow chart has been depicted in Figure 1.
Logistic regression (LR) was the most frequently used algorithm for building radiomics signature (n=8) (29-33, 35-37).The characteristics of the included studies are shown in Table 1.

Quality assessment 3.3.1 QUADAS-2
The result of the quality assessment based on the QUADAS-2 tool showed that in the patient selection domain, there were unclear risks of bias and unclear applicability concerns for three studies due to not specifying inclusion/exclusion criteria (31,32,34).For the index test, there was an unclear risk of bias for three studies due to not using cross-validation methods for modeling (29,33,36).However, as they matched research questions, no applicability concern was detected.In reference standard domain, three studies were detected as having a high risk of bias due to using biopsy for obtaining specimens (31,35,38).Finally, there was an unclear risk of bias for five studies due to not mentioning the interval between imaging acquisition and   30,[32][33][34].Therefore, the quality of the included studies was almost acceptable (Figure 2).

RQS
The ten studies obtained an average RQS score of 10.7, with individual scores ranging from 1 to 16 out of 36 points.The average score was 29%, and the study with the highest rating achieved 44%.Nearly half of the studies fell within the score range of 10 to 16.Only two studies employed imaging at multiple points (33,35).None of the studies used phantom study prospective design, costeffectiveness analysis, comparison to the gold standard, and open science items from the RQS checklist.Two studies received -5 points in validation items due to not using validation cohorts (30,38).Image protocol quality, multiple segmentation, feature reduction, and biological correlation analysis were complete in all studies.Table 2 represents the RQS scores for each study and item.

Validation cohorts
Only using 3D Slicer for feature extraction in validation cohorts contributed to inter-study heterogeneity (p-value=0.04).

Subgroup analysis 3.4.5.1 Training cohorts
In training cohorts (Table 3), the sensitivity of studies with a sample size smaller than 150, non-contrast-enhanced CT, logistic regression for modeling, and PyRadiomics for feature extraction was significantly higher (p-value< 0.01).In addition, the sensitivity of studies that used biopsy for tissue obtaining and LASSO for feature Summary ROC curve with confidence and prediction regions in training (A) and validation (B) cohorts.
Coupled forest plot of the diagnostic performance in validation cohorts.
Luo et al. 10.3389/fonc.2024.1329801 Frontiers in Oncology frontiersin.orgreduction was higher but not statistically significant (0.05<p-value< 0.10).The specificity of studies with a sample size smaller than 150, used PyRadiomics for feature extraction or used semiautomatic segmentation was significantly higher (p-value< 0.05).

Validation cohorts
In validation cohorts (Table 4), the sensitivity and specificity of studies with a sample size smaller than 150 were significantly higher (p-value< 0.01).In addition, the sensitivity of studies that used surgery for tissue obtaining and 3D Slicer for feature extraction was higher but not statistically significant (0.05<p-value< 0.10).In addition, studies that used semiautomatic segmentation had a higher specificity but were not statistically significant (0.05<p-value< 0.10).

Publication bias
We found publication bias in training cohorts based on Deeks' asymmetry test (p-value=0.01).However, in validation cohorts, publication bias was not significant (p-value <0.13).Deeks' funnel plots are shown in Figure 6.Publication bias was observed exclusively in training cohorts, which typically exhibit higher diagnostic performance compared to validation cohorts.This occurrence, where training cohorts tend to produce more positive results than negative ones, is a prevalent cause of potential publication bias.Various factors, including the inclination to publish novel or statistically significant findings, can contribute to the emergence of this bias.

Sensitivity analysis
To find the possible source of publication bias in training cohorts, we eliminated each study from the analysis and pooled the remaining studies to re-evaluate Deeks' p-value (Table 5).It was found that after eliminating the results by Sun et al., Zhe et al., and Gu et al., the p-value increased; however, it still was <0.05.When removing studies two by two, the Deeks' p-value of remaining results significantly increased, and this increase reached its maximum after the removal of three studies together (p-value=0.22).Therefore, we concluded that these studies might have contributed to the publication bias.It should be noted that overall results, even with removing these three studies, did not change significantly (Table 5), suggesting that the results are consistent.

Leave-one-out analysis
Leave-one-out analysis was performed to investigate the robustness of the results by removing each study one by one.In the included training cohorts, it was observed that upon excluding the cohorts by Sun et al. or Dong et al., there was a slight decrease in the values of specificity, PLR, and DOR.This suggests that these cohorts exhibited slightly higher diagnostic performance compared to other studies (Supplementary Figures S1-3).However, no significant changes were noted in the sensitivity, specificity, PLR, NLR, or DOR values upon the removal of each validation cohort (Supplementary Figures S4-6).Taken together, no concern existed regarding the robustness of the results.

Discussion
The present meta-analysis showed that CT-based radiomics have an excellent diagnostic performance for predicting Ki-67 expression with pooled AUCs > 0.80 in both training and validation cohorts of ten studies.In addition, the quality of the included studies was evaluated using two different tools, including QUADAS-2 and RQS, indicating that the included studies have an acceptable quality.The interpretation of results in a meta-analysis is significantly impacted by the quality of the included articles.In the realm of radiomics research and the application of the RQS, the overall reliability of meta-analysis findings relies on the methodological rigor and thoroughness of each individual study.If the articles included in the analysis have low RQS scores, indicating inadequate methodological quality and presentation, it introduces a risk of bias and reduces confidence in the pooled results.Additionally, variations in study design, reporting, and validation practices among low-quality studies can introduce heterogeneity, complicating the integration of data and potentially leading to less robust conclusions.Thus, a careful evaluation of the quality of individual articles, facilitated by tools like RQS, becomes crucial to ensure the accuracy and applicability of meta-analytic outcomes in the field of radiomics and beyond.In a recently published meta-analysis on the diagnostic performance of MRIbased radiomics for predicting Ki-67 in breast cancer (24), the mean Deeks' funnel plots for training (A) and validation (B) cohorts.RQS of the included studies was around 6, significantly lower compared to our study.suggests that the quality of the articles included in our meta-analysis was higher.
A meta-analysis encompassing 15 studies and 1931 patients demonstrated the prognostic significance of Ki-67 in stage I NSCLC.The analysis revealed that a high Ki-67 labeling index (LI) in stage I NSCLC is predictive of poorer OS and DFS.Furthermore, the study explored the prognostic impact, specifically in stage I adenocarcinoma, providing novel insights.Despite acknowledged limitations, such as reliance on pooled data and potential publication bias, the meta-analysis recommends Ki-67 as a routine biomarker in stage I NSCLC.It suggests that patients with high Ki-67 LI may benefit from adjuvant therapy (10).These data highlight the prominent role of Ki-67 in the prognosis of lung cancer patients.The limitations of lung cancer biopsy include poor discriminatory capability of imaging, late diagnosis, variability in diagnostic pathways, and potential complications from biopsy procedures (39).Radiomics, a field focused on extracting and analyzing quantitative features from medical images, offers a promising solution to these challenges.By providing a more detailed and comprehensive characterization of lesions, radiomics enhances the discriminatory capability of imaging, aids in early detection, and enables risk stratification.Additionally, it supports personalized medicine by identifying tumor heterogeneity and potential molecular targets, potentially reducing the need for invasive biopsies.Radiomics also facilitates monitoring treatment response, allowing clinicians to assess therapy effectiveness and make timely adjustments to treatment plans (40)(41)(42).Integrating radiomics into current prognostic workflows for predicting Ki-67 expression in lung cancer involves incorporating radiomic features extracted from imaging data, such as CT scans, into existing prognostic models.This process requires the development of algorithms that capture intricate patterns related to Ki-67 expression, with subsequent validation against established indicators and clinical outcomes (43,44).Collaboration between radiologists, oncologists, and data scientists is crucial for optimization, standardization, and the establishment of protocols.Education programs for healthcare professionals ensure proper interpretation of radiomic data, and continuous refinement through research and clinical feedback contributes to the ongoing improvement of these models.The successful integration of radiomics necessitates a multidisciplinary approach, technological standardization, and collaborative efforts across healthcare settings (45).
CT-based radiomics emerges as an efficient tool for predicting Ki-67 expression in lung cancer, particularly in NSCLC, providing several advantages.The ease of acquisition and non-invasiveness of CT scans allows for early Ki-67 expression prediction.The integration of radiomics-based analyses with radiologist assessment proves beneficial in clinical decision-making for NSCLC patients.This approach has already been investigated for predicting EGFR mutation status, facilitating the identification of patients suitable for targeted therapies.CT radiomics-based models present distinct advantages over conventional methods by leveraging existing imaging data, eliminating the need for invasive procedures, and minimizing patient discomfort and risk.Routine CT scans yield readily available data for radiomics-based models, enabling a comprehensive assessment of tumor characteristics throughout the entire tumor volume.Advanced imaging analysis techniques quantify features, enhancing the precision of predicting genetic mutations.Serial CT scans enable longitudinal monitoring of tumor characteristics, allowing for the evaluation of treatment response, disease progression, and monitoring emerging genetic mutations over time.While CT radiomics-based models should not replace confirmatory molecular testing methods, they significantly contribute to noninvasive and comprehensive assessments in lung cancer management (46).
In this meta-analysis, we observed a significant heterogeneity in training cohorts, which made meta-regression necessary.First, we observed that contrast-enhanced CT-scan images contributes to inter-study heterogeneity.This finding was justifiable as inconsistencies and differences can occur in different process of enhanced CT-scan acquisition due to various reasons such as temporal imaging and lesion enhancement patterns.Second, we found that using PyRadiomcis for feature extraction contributed to interstudy heterogeneity.Utilizing diverse software tools for radiomics feature extraction introduces interstudy heterogeneity, primarily stemming from algorithmic dissimilarities, variations in parameter configurations, discrepancies in segmentation methods, differences in image preprocessing approaches, disparities in normalization and scaling procedures, diverse criteria for feature selection, and potential inconsistencies in software updates and versions.These distinctions in radiomic processes yield distinct sets of features, complicating direct comparisons between studies (47).The superiority of PyRadiomics compared to other feature extraction software has also been mentioned in a previous metaanalysis (48).Lastly, using a Ki-67 cut-off of 14% could also cause interstudy heterogeneity.Different cut-off values for Ki-67 expression were used in the included studies, causing a significant heterogeneity.
Our subgroup analyses have revealed common findings in validation and training cohorts.Firstly, studies with sample sizes smaller than 150 had a significantly higher diagnostic performance.The observed phenomenon of significantly higher diagnostic performance in studies with sample sizes smaller than 150 could be attributed to various factors.These include the sensitivity of smaller sample sizes to outliers, increased heterogeneity in smaller studies, a potential publication bias that favors the reporting of positive results in smaller studies, a more substantial impact of random variation due to limited sample size, specific clinical contexts where certain diagnostic tests perform better in smaller populations, and interactions between the characteristics of the studied subgroups and the sample size.Each of these factors may contribute to or influence the apparent difference in diagnostic performance observed in the subgroup analysis.Second, we observed that semi-automatic segmentation may increase the specificity of the results.This method allows for accurate delineation of the ROI, enabling correction of errors, customization for complex structures, and adaptability to anatomical variability.
In our meta-analysis, we also observed a significant publication bias for training cohorts based on Deeks' test.Publication bias happens when studies with positive or statistically significant results are more likely to be published, while studies with null or negative findings are less likely to be published.We identified three studies as the possible source of publication bias.However, eliminating these one by one or even together could not change the overall diagnostic performance of the radiomics approach for the prediction of Ki-67 in lung cancer, indicating that the results were consistent.Publication bias occurs when research findings are selectively published based on their nature and direction, often leading to an overemphasis on positive or statistically significant results.To mitigate publication bias, researchers can pre-register studies, encouraging the publication of negative results, and promoting systematic reviews that include unpublished studies.Journals can adopt policies prioritizing study design over results, and open access to data can facilitate result verification.Transparent reporting guidelines, publication of study protocols, and rigorous editorial and peer review processes further contribute to reducing the impact of publication bias, ensuring a more balanced representation of scientific evidence (49, 50).

Limitations
Several limitations in this meta-analysis warrant consideration: 1. Lack of Validation Cohorts: A notable limitation is the absence of validation cohorts in several studies, necessitating the pooling of training and validation cohorts separately, which may impact the generalizability of the findings.

Retrospective Study Design:
The retrospective nature of all included studies introduces inherent biases and limits the establishment of causal relationships, emphasizing the need for prospective investigations to validate the observed associations.
3. Geographic Bias: The exclusive focus on studies conducted in China introduces regional bias, potentially limiting the generalizability of findings to a broader, more diverse population.
4. Limited Segmentation Information: While all studies utilized 3D segmentation, the absence of information on 2D segmentation performance for predicting Ki-67 in lung cancer underscores a potential gap in understanding the comparative effectiveness of these methods.

Scarcity of Automatic Segmentation:
The limited use of automatic or semiautomatic segmentation in only two studies emphasizes the necessity for further research to explore the performance and advantages of automated segmentation methods.
6. Absence of Deep Learning Approaches: The exclusion of deep learning-based radiomics methods in all studies underscores a current gap in exploring the potential benefits of advanced machine learning techniques, which could enhance predictive accuracy (51).
7. Variability in Ki-67 Expression Cutoffs: The inconsistency in defining lesions as positive for Ki-67 expression due to different cutoff points across studies poses a challenge to standardization, suggesting the need for authors to test multiple cutoff points in their investigations.
8. Publication Bias Concerns: The identification of publication bias raises awareness of a potential inclination toward reporting positive results.Encouraging journals and authors to publish negative results can help address this bias and provide a more comprehensive understanding of the predictive capabilities of CT-based radiomics for Ki-67 expression in lung cancer.

Conclusion
In conclusion, this meta-analysis of 10 retrospective studies investigating CT-based radiomics for predicting Ki-67 expression in lung cancer demonstrates promising diagnostic performance, indicating the potential clinical utility of radiomic features.These findings collectively highlight the potential of radiomics in noninvasive prediction of Ki-67 expression, emphasizing the importance of cautious interpretation and the need for further research to address methodological heterogeneity and potential biases.

FIGURE 1 PRISMA
FIGURE 1PRISMA flowchart of the study.

FIGURE 3
FIGURE 3Coupled forest plot of the diagnostic performance in training cohorts.

TABLE 1
Characteristics of the included studies.

TABLE 2
RQS quality assessment of the included studies.

TABLE 3
Subgroup analysis and meta-regression results in training cohorts.

TABLE 4
Subgroup analysis and meta-regression results in validation cohorts.

TABLE 5
Results of the sensitivity analysis.