Diagnostic performance of CT scan–based radiomics for prediction of lymph node metastasis in gastric cancer: a systematic review and meta-analysis

Objective The purpose of this study was to evaluate the diagnostic performance of computed tomography (CT) scan–based radiomics in prediction of lymph node metastasis (LNM) in gastric cancer (GC) patients. Methods PubMed, Embase, Web of Science, and Cochrane Library databases were searched for original studies published until 10 November 2022, and the studies satisfying the inclusion criteria were included. Characteristics of included studies and radiomics approach and data for constructing 2 × 2 tables were extracted. The radiomics quality score (RQS) and Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) were utilized for the quality assessment of included studies. Overall sensitivity, specificity, diagnostic odds ratio (DOR), and area under the curve (AUC) were calculated to assess diagnostic accuracy. The subgroup analysis and Spearman’s correlation coefficient was done for exploration of heterogeneity sources. Results Fifteen studies with 7,010 GC patients were included. We conducted analyses on both radiomics signature and combined (based on signature and clinical features) models. The pooled sensitivity, specificity, DOR, and AUC of radiomics models compared to combined models were 0.75 (95% CI, 0.67–0.82) versus 0.81 (95% CI, 0.75–0.86), 0.80 (95% CI, 0.73–0.86) versus 0.85 (95% CI, 0.79–0.89), 13 (95% CI, 7–23) versus 23 (95% CI, 13–42), and 0.85 (95% CI, 0.81–0.86) versus 0.90 (95% CI, 0.87–0.92), respectively. The meta-analysis indicated a significant heterogeneity among studies. The subgroup analysis revealed that arterial phase CT scan, tumoral and nodal regions of interest (ROIs), automatic segmentation, and two-dimensional (2D) ROI could improve diagnostic accuracy compared to venous phase CT scan, tumoral-only ROI, manual segmentation, and 3D ROI, respectively. Overall, the quality of studies was quite acceptable based on both QUADAS-2 and RQS tools. Conclusion CT scan–based radiomics approach has a promising potential for the prediction of LNM in GC patients preoperatively as a non-invasive diagnostic tool. Methodological heterogeneity is the main limitation of the included studies. Systematic review registration https://www.crd.york.ac.uk/Prospero/display_record.php?RecordID=287676, identifier CRD42022287676.


Introduction
Despite advancements in identification and treatment, gastric cancer (GC) remains a significant global health challenge, ranking as the fifth most diagnosed cancer globally and the fourth leading cause of cancer-related mortality, with an estimated 769,000 deaths reported in 2020 alone (1).The selection of the optimal treatment strategy for GC is largely based on the tumor-nodal-metastasis (TNM) staging system, which assesses the extent of tumor invasion through the different layers of the stomach (T), lymph node involvement (N), and distant metastasis (M).This staging system is important in determining the most appropriate treatment approach, such as surgery, chemotherapy, and/or radiation therapy, and has been shown to be a reliable predictor of patient outcomes (2).Accurate determination of lymph node metastasis (LNM) status is critical for optimal management of GC.As the main component of TNM staging, LNM status is used to select the appropriate preoperative treatment strategy and is also an important prognostic factor for patient survival and tumor recurrence after surgical resection.Thus, it is essential to accurately determine LNM status (3,4).Current traditional imaging methods for assessing nodal status are based on lymph node (LN) shape, enhancement, and size, which can be normal or enlarged.Most patients may be misclassified for nodal staging in the TNM system.To date, computed tomography (CT) is the most common imaging modality, which is widely used for preoperative estimation of nodal status.However, the reported overall accuracy was low and unsatisfactory.Therefore, it is necessary to establish more precise methods to supplement the current methods of assessing LN status (5)(6)(7).
Recently, radiomics has attracted more attention as the methodology of translating medical images into reproducible and quantitative data for clinical decision support.Radiomics extracts quantitative features, so-called radiomics features, from diagnostic images by using mathematical machine learning or deep learning algorithms to uncover the hidden tumor characteristic, which is not seen by the naked eye and helps predict the considered outcome, for example, LNM prediction.In detail, radiomics features are extracted from the region of interest (ROI) or volume of interest (VOI).When two-dimensional (2D) ROI or (3D) VOI is delineated by a radiologist, software, or both (image segmentation), the different types of radiomics features (e.g., histogram based and texture based) are extracted by mathematical methods.Maybe hundreds of radiomics features are extracted; however, most of them are redundant and non-informative.Therefore, they have to be transformed or removed (dimensionality reduction), and then the most informative features should be selected (feature selection).Finally, a predictive model is established based on the selected features (model construction) to predict the outcome (e.g., LNM prediction) (8,9).Hence, radiomics can capture a lot of valuable invisible information non-invasively and more precisely.
In this meta-analysis, we have collected evidence from previous studies to further investigate the diagnostic accuracy of CT-based radiomics for predicting LNM metastasis status in GC patients in order to help applying the radiomics approach in clinical practice.

Materials and methods
This systematic review and meta-analysis were conducted according to the recommendations of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (Supplementary Material) (10).The study protocol was registered on the International Prospective Register of Systematic R e v i e w s ( P R O S P E R O ) p r o s p e c t i v e l y ( r e g i s t r a t i o n no.CRD42022287676).

Literature search
A computerized search of PubMed, Embase, Web of Science, and Cochrane Library databases was performed without a limitation of a start date for studies published until 16 August 2022.We searched databases for the second time on 10 November 2022 to discover newly published studies.All related search terms and synonyms were considered in the search strategy as follows: [(GC) OR (gastric tumor) OR (stomach cancer) OR (stomach tumor)] AND [(CT) OR (computed tomography)] AND [(lymph node) OR (lymphatic) OR (lymphovascular)] AND [(radiomic) OR (radiomics) OR (texture)].We used Mendeley software, version 1.19.8, and Rayyan (11) for managing references.Two observers (Z.H. and P.T.) screened references by title and abstract to determine eligibility.Then, the full text based on inclusion and exclusion criteria was reviewed.Also, included study references were manually searched to find additional eligible studies.We restricted the search to the studies published in English.Uncertainties were resolved by consulting the third observer (L.A.M.).

Inclusion criteria
We selected studies satisfying the following PICO criteria: (1) population: patients diagnosed with GC; (2) index test: index test used CT scan for detection of LNM; (3) comparator test: for comparison, histopathologic results were considered as the reference standard; and (4) test accuracy or outcome: studies provided the area under the curve (AUC), sensitivity, and specificity data of CT-based radiomics or the corresponding data for a 2 × 2 contingency table construction.

Exclusion criteria
Exclusion criteria were set as follows: (1) studies in the form of conference abstracts, review articles, case reports, editorial, comments, letters, and animal studies; (2) studies not related to the CT scan-based radiomic prediction of LNM or GC; (3) studies in languages other than English; and (4) unable to construct 2 × 2 contingency table.

Data extraction
The following data were extracted, regarding patient, study, and CT-based radiomics characteristics using a standardized

Quality assessment
The methodological quality of included studies was assessed using the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) (12) tool and radiomics quality score (RQS) (13).Two independent observers (Z.H. and P.T.) conducted data extraction and quality assessment.Any disagreement was resolved by reaching a consensus.

Statistical analysis
This meta-analysis was performed on MIDAS module in STATA 14.0 (StataCorp, Texas, United States).We quantified predictive accuracy by calculating pooled sensitivity, specificity, diagnostic odds ratio (DOR), positive likelihood ratio (PLR), and negative likelihood ratio (NLR) with 95% confidence interval (CI).The summary receiver operating characteristic curve (SROC) was created, and AUCs were used to summarize diagnostic accuracy.I 2 values were calculated to assess statistical heterogeneity among the included studies.I 2 values of 0%-25%, 25%-50%, 50%-75%, and > 75% represent very low, low, medium, and high-statistical heterogeneity, respectively.Coupled forest plots were created for showing pooled sensitivity and specificity.Studies and effect sizes were pooled using a random-effect model, indicating that the estimation of the distribution of true effects between studies considers heterogeneity.The presence of threshold effects was investigated in MetaDisc 1.4 by computing the Spearman's correlation coefficient (r) between the logit (true positive rate) and logit (false positive rate).Subgroup analysis was performed to investigate the heterogeneity causes.The following covariates were selected to assess which factor causes heterogeneity: top left method used or not, segmentation dimension, arterial or venous phase of CT scan, tumoral or nodal segmentation, and automatic or manual segmentation.Furthermore, to assess the impact of included studies on the overall estimate, a sensitivity analysis was performed by eliminating each study.Deeks' funnel plot was created to examine publication bias.Some studies did not report sensitivity and specificity to construct 2 × 2 table construction.Thus, we used the receiver operating curve (ROC) to calculate sensitivity and specificity using the top left method (14).

Literature search
According to the search strategy, 123 citations were identified from databases, of which 58 were duplicates.After screening records by title and abstract, 23 were excluded because they did not meet the inclusion criteria.After a full-text review, 27 were omitted, leaving 13 articles for meta-analysis.A new literature search was repeated, and two eligible articles based on inclusion criteria were included.Finally, 15 eligible articles were selected for final meta-analysis.The detailed literature search flowchart is depicted in Figure 1.

Characteristics of included studies
Characteristics of the included studies and predictive models are shown in Tables 1, 2. We enrolled 15 studies with a total number of 7,010 patients.Studies were published from July 2019 to October 2022, of which 46% (seven of 15) were published in 2021 and 2022.Study selection flowchart.
All study populations were from China and designed retrospectively.Only one study (25) used a prospective testing set (n = 112).One study (20) included patients with gastric adenocarcinoma, and the remaining studies included patients with GC.Majority of patients were male (4,935 vs. 2,075).Seven thousand ten patients were divided into a training set (n = 4136) and a testing set (n = 2874).Three studies (25,26,28) also used an external testing set.Eleven studies recruited patients from one center, three studies from two centers (25,26,28) and one study (15) from four centers.Pathological confirmation of LNM was the reference standard in all studies.Most of the studies (9/15) used a venous phase CT scan, and five used an arterial phase for lesion segmentation.One study (20) used both venous and arterial phases.Six studies used PyRadiomics for feature extraction from images.
Open-source ITK-SNAP was the most commonly used tool for lesion segmentation (nine of 15).Twelve studies performed segmentation manually, and the other three studies performed it automatically.Most of the studies (nine of 15) delineated 2D regions of interest, and the remaining performed 3D segmentation (six of 15).Extracted features ranged from 35 to

RQS
The average RQS score of the included studies was 14.8, accounting for 41% of the total points.The highest RQS score was 24 points (66%), seen in only one study (25), which used a prospective dataset for model evaluation.Almost half of the studies (seven of 15) were credited between 11 and 14 points, corresponding to 30%-40% of total points.These items were not performed by studies and therefore were assigned 0 points: imaging at multiple time points, cost-effectiveness analysis, and open science and data.Details are shown in Table 3.

QUADAS-2
Quality assessment according to QUADAS-2 is illustrated in Figure 2. Generally, quality assessment was acceptable.There was no high risk of bias or high applicability concern.The reason for the unclear risk of bias in each of the domains included: reporting consecutive or random sampling of patients in patients' selection domain, reporting the index test interpretation without knowledge of reference standard result in the index test domain, and reporting the appropriate interval between index and reference standard test in flow and timing domain.

Data analysis
Methodologically, included studies utilized extracted CT scan features in order to establish radiomics models by using machine learning or deep learning mathematical algorithms.Also, a combined model incorporating radiomics features and clinical variables (e.g., laboratory tests and CT reported LN status) was constructed.Accordingly, we have split data analysis based on radiomics models and combined models and analyzed data separately.
In addition, included studies enrolled patients and then integrated them as a main dataset.Then, they divided the main dataset into a training set and testing set (internal testing/validation set) by a specific proportion, randomly.Training set is used for discovering and learning hidden mathematical algorithms in the dataset in order to predict the expected outcome.Finally, a prediction model is established based on those algorithms whose predictive accuracy is evaluated by an internal testing set.In order to generalize trained model, some studies utilize other datasets in addition to the main dataset and use it as a testing set (external testing set) and the predictive accuracy of the trained model is evaluated again.Therefore, in studies with various testing sets, we selected two testing sets (or cohorts) and considered them as separate studies for evaluation of predictive accuracy.

Heterogeneity analysis
The I 2 test showed that sensitivity (I 2 = 79.23%) and specificity (I 2 = 86.08%)both have a high heterogeneity.For threshold analysis, the Spearman's correlation coefficient was measured as 0.046 with a p-value of 0.875, indicating the absence of a threshold effect.

Subgroup analysis
Subgroup analysis was done in order to explore the heterogeneity causes (provided in Table 4) by comparing various study variables.Studies whose sensitivity and specificity were extracted by top left method (n = 6) compared to studies that did not (n = 8) had a higher sensitivity (0.78 vs. 0.73, p = 0.21) and specificity (0.82 vs. 0.79, p = 0.14) with a joint analysis p-value of 0.65.Studies that used 3D VOI (n = 8) compared to studies with a 2D ROI (n = 6) had a higher sensitivity (0.78 vs. 0.71, p = 0.27) but a lower specificity (0.74 vs. 0.85, p = 0.00) with a joint analysis p-value of 0.15.Arterial phase CT scan (n = 4) has a higher sensitivity (0.84 vs. 0.71, p = 0.65) and specificity (0.91 vs. 0.77, p = 0.90) than venous phase (n =10) with a joint analysis p-value of 0.01.Studies (n =3) with tumor and LNs as the ROI have a higher sensitivity (0.81 vs. 0.74, p = 0.06) and specificity (0.86 vs. 0.79, p = 0.98) than studies with only the tumoral ROI (n =11) with a joint analysis p-value of 0.49.Automatic drawn (n = 3) regions of interest have a higher sensitivity (0.77 vs. 0.75, p = 0.31) and specificity (0.91 vs. 0.76, p = 0.98) compared to manual segmentation (n = 11) with joint analysis p-value of 0.31.

Publication bias
No publication bias was found in radiomics model studies based on deeks funnel plot (p = 0.23) (Figure 5).Risk of bias (left) and applicability concerns (right) of included studies using QUADAS-2 checklist.

Heterogeneity analysis
The I 2 test showed that sensitivity (I 2 = 78.96%)and specificity (I 2 = 83.32%)both have a high heterogeneity.For threshold analysis, the Spearman's correlation coefficient was measured as −0.081 with a p-value of 0.803, indicating the absence of a threshold effect.

Publication bias
Deek's funnel plot has shown a publication bias in combined model studies (p = 0.05) (Figure 8).Therefore, we performed sensitivity analysis.

Sensitivity analysis
We eliminated included cohorts in combined model analysis one by one, and the changes were observed.Eliminating the study by Z. Sun et al. (25).showed that increased p-value significantly, thus, reducing publication bias (Table 6).It can be explained by the large number of participants in the study.Also, the top left method used for calculation of sensitivity and specificity is also can be a reason.This meta-analysis investigated the utility of radiomics-based models based on CT scan images for the prediction of LNM occurrence in GC patients preoperatively.Our analysis showed that radiomics-based models have a promising potential for the prediction of positive LNM in GC.However, the relatively low quality of performing and reporting of radiomics studies in GC is currently suboptimal to allow radiomics to be widely adopted in clinical applications.Nevertheless, it has become evident that radiomics approaches have a promising role in the discrimination of target lesion classes in GC patients at high risk for LNM.Thus, if studies follow the same methodological guidelines more strictly and also use large and comprehensive datasets from several centers, we may create an excellent opportunity for radiomics application for more tailored therapies, thus reaching better clinical outcomes.
Recently, the radiomics approach as a non-invasive diagnostic tool offered a new perception for clinicians in disease management, especially in the field of oncology.Therefore, a growing number of papers investigated radiomics applicability in cancers of different organs such as gastrointestinal, respiratory, neurological, and breast (30).Focusing on the prediction of LNM in cancers, a previous meta-analysis of 12 studies (793 patients) by Longchao Li et al. (31) culminates that MRI-based radiomics models have a promising diagnostic accuracy in cervical cancer with a pooled sensitivity, specificity, and AUC of 80%, 76%, and 0.83, respectively.Forest plot for combined models.Their analysis showed that the pooled sensitivity, specificity, and AUC were 82%, 83%, and 0.89, which offers a good discrimination ability of radiomics models.The authors reported that the small number of patients, significant heterogeneity, and low-quality assessment scores were the major limitations of the studies.Generally, radiomics studies select patients and consider them as a main dataset for model construction.The main dataset is randomly divided into training and internal testing sets.First, the radiomics model learns the unseen mathematical pattern and structure of the dataset from the training set.The developed model needs to be evaluated and tested for its performance and generalizability.There are two types of testing datasets: internal and external testing sets.Internal testing is derived from the same dataset from which the training dataset was taken.The second type is external testing, which is selected from a different institution and region.Therefore, the developed model uses testing sets for performance evaluation.Using external testing helps radiomics approach to be more generalized and comprehensive in order to have a role in clinical practice.Three studies (three of 15) used an external testing set.Furthermore, we can integrate radiomics models established by imaging features with other clinical data and develop a new model called the "combined model" (30).
In the current study, we separate analyses based on the radiomics model and combined model separately.Some studies used both radiomics model and combined model.Others used only one of them.The sensitivity, specificity, and AUC of the radiomics model were approximately 75%, 80%, and 0.85, indicating good performance.It is Despite this, an apparent heterogeneity was found among the studies.Thus, we explored possible heterogeneity sources using subgroup analysis to pave the way for upcoming studies.Spearman's correlation coefficients were not the heterogeneity sources.We were concerned about the difference between studies with calculated top left method and studies which did not.Results showed that the calculated top left point had a slightly better performance.CT scan phase differences were also explored, and results showed that the arterial phase has a better outcome than the venous phase in both radiomics and combined models.Image segmentation is a crucial process in radiomics approach, since radiomics features will be extracted from the delineated areas (33).
3D segmentation had only a better sensitivity in radiomics models.Otherwise, 2D segmentation had an overall higher value than 3D segmentation.Surprisingly, selecting the largest imaging plane for segmentation showed that 2D segmentation not only has better results but also it is less time consuming and simple.Segmentation of the tumoral area has shown to have a better predictive performance compared to tumoral and nodal areas in both radiomics and combined models.Although manual segmentation of the ROI is preferred in the majority of studies, automatic and semi-automatic segmentations discriminate better than manual segmentation in both radiomics and combined models.
Despite the promising results in this study, the RQS scores of studies were low to moderate ranging from 11 to 24 of 36 possible scores.Only three studies tested the model's performance externally.Of note, only one study used a prospective dataset (25).QUADAS-2 quality assessment revealed some issues to be optimized in upcoming papers, for example, mentioning the consecutive or random sampling of patients, reporting the blindness of readers to the pathological status of samples, and reporting the interval between the index test and reference test.

Limitation
This review highlights some limitations in studies as reflected by methodological assessment.We had to exclude a number of studies that achieved the inclusion criteria but did not have enough data to analyze, which indicates a pitfall in reporting results.Studies acquired a significant heterogeneity score, which was similar to previous diagnostic radiomics meta-analyses (31,32).
Also, included studies presented a relatively small and wide range of patient numbers.The majority of datasets were selected retrospectively, which can contribute to selection bias.In addition, patient recruitment from one center restricted results from being generalized and reproducible.Four studies (four of 15) used more than one center for patient selection.Additionally, studies used different CT scanning protocols.We only could overcome the arterial and venous phase differences by subgroup analysis but still the high heterogeneity of CT scanning protocols and techniques between studies could not be overcome by subgrouping.Moreover, in most studies, the GC stage and LN station were not considered in image analysis and modeling.Therefore, the extracted and selected features are different, which obviously affects the performance of models and also leads to interstudy heterogeneity.In addition, the segmentation methods and software used in studies can affect models.Taken together, the main obstacle in studies was the heterogeneities in study methodologies.Therefore, it shows the necessity of establishing a unified standard and guideline for radiomics accomplishment, and more importantly, future explorations should adhere to the standards.

Conclusion
Our analysis demonstrated that the CT scan-based radiomics approach seems promising for predicting LNM in GC patients before surgery and has an excellent diagnostic accuracy for surgery planning and personalized therapy.Nevertheless, high heterogeneity of studies indicates the necessity of a unified guideline for radiomics conduction in upcoming research.Therefore, so far it is crucial to consider radiomics limitations in clinical application.

FIGURE 3 Forest
FIGURE 3Forest plot of radiomics models.

FIGURE 5 Funnel
FIGURE 5Funnel plot of publication bias based on Deek's asymmetry test in radiomics model studies.

FIGURE 7 SROC
FIGURE 7SROC of combined model studies.

TABLE 1
General characteristics of the included studies.

TABLE 2
General characteristics of predictive models in the included studies.

TABLE 3
Radiomics quality score and average scores of studies.

TABLE 4
Subgroup analysis in radiomics model studies.
p, p-value; ROI, region of interest; VOI, volume of interest.

TABLE 5
Subgroup analysis in combined model studies.
p, p-value; ROI, region of interest; VOI, volume of interest.FIGURE 8Funnel plot of publication bias based on Deek's asymmetry test in combined models.

TABLE 6
Results of sensitivity analysis.
AUC, area under the ROC curve; DOR, diagnostic odds ratio; NLR, negative likelihood ratio; PLR, positive likelihood ratio.