Artificial Intelligence in the Imaging of Gastric Cancer: Current Applications and Future Direction

Gastric cancer (GC) is one of the most common cancers and one of the leading causes of cancer-related death worldwide. Precise diagnosis and evaluation of GC, especially using noninvasive methods, are fundamental to optimal therapeutic decision-making. Despite the recent rapid advancements in technology, pretreatment diagnostic accuracy varies between modalities, and correlations between imaging and histological features are far from perfect. Artificial intelligence (AI) techniques, particularly hand-crafted radiomics and deep learning, have offered hope in addressing these issues. AI has been used widely in GC research, because of its ability to convert medical images into minable data and to detect invisible textures. In this article, we systematically reviewed the methodological processes (data acquisition, lesion segmentation, feature extraction, feature selection, and model construction) involved in AI. We also summarized the current clinical applications of AI in GC research, which include characterization, differential diagnosis, treatment response monitoring, and prognosis prediction. Challenges and opportunities in AI-based GC research are highlighted for consideration in future studies.


INTRODUCTION
As one of the most common cancers, gastric cancer (GC) ranks as the top three in terms of mortality rate (1). The American Joint Commission on Cancer (8th Edition) for Gastric Cancer recommends computed tomography (CT) and endoscopic ultrasound for pretreatment TNM classification, whereas magnetic resonance imaging (MRI) and Positron Emission Tomography -Computed Tomography (PET-CT) are effective alternatives for metastasis evaluation. Despite the introduction of new techniques, the pretreatment diagnostic accuracy of GC varies from 40.8% to 98.1% (2)(3)(4). Efforts have also been made toward the prediction of histological type such as tumor differentiation grade and Lauren classification, based on enhancement pattern analysis, perfusion analysis, and spectral analysis, which have moderate discriminating performance and area under the curve (AUC) ranging from 0.697 to 0.891 (5)(6)(7). Given the importance of accurate pretreatment imaging evaluation and prognostic value of histopathological features, there is an urgent need for better diagnostic methods for treatment planning.
Fortunately, there has been considerable progress in artificial intelligence (AI) during the past decade, which offers promise for meeting these needs. Of all the AI techniques, hand-crafted radiomics and deep learning (DL) are the two most frequently applied methods for medical imaging and have shown the powerful capacity for converting mass medical images into minable data. With the ability to detect features that are invisible to human readers, hand-crafted radiomics and DL have demonstrated promising performance in tumor detection, characterization, and monitoring (8).
Therefore, we reviewed the published AI methodologies utilized in studies on GC imaging to provide an overview of the latest developments. This included data acquisition, lesion segmentation, feature design, and model construction. Furthermore, we summarize the representative clinical applications, knowledge gaps, and future directions. A total of 47 published AI studies on gastric cancer imaging were selected through MEDLINE (June, 2021), of which 45 were retrospective in design (36 single-center and 9 multicenter studies), while the remaining two were single-center prospective studies ( Table 1). Imaging modalities varied across the studies. Specifically, 39 studies were performed using CT with only one studies based on dual-energy CT, six used MRI, and two used PET-CT ( Table 1).

Data Acquisition
Image preprocessing accounts for the substantial heterogeneity introduced by different imaging modalities, scanning protocols, machine types, and manufacturers. Image intensity normalization and resampling are two mathematical techniques that are used widely for this purpose. Specifically, image intensity normalization is performed to transform the original image into a standardized form to reduce data variability between cohorts and to generate appropriate inputs for quantitative radiomic feature calculation (20,27). Resampling is used to adapt the input shape of the model by transforming the original image into the target size by upsampling or downsampling (32,33,36,44).
Manual segmentation, which is usually carried out by radiologists, involves placing rectangular/circular boxes that delineate the two-/three-dimensional (2D/3D) boundary of the whole lesion. In Di Dong et al.'s study, 2D ROIs were placed to cover the largest tumor area for predicting lymph node metastasis in locally advanced GC (32). Yue Wang et al. segmented the entire tumor and built a 3D-based handcrafted radiomics model to diagnose intestinal-type gastric adenocarcinomas (39). In addition, Wenjuan Zhang et al. constructed a DL model on 18 layers of residual convolutional neural network (CNN) with squared segmentation of CT images to predict overall survival (OS) in GC patients (44). It is important to note that because subjective judgments regarding tumor boundaries can vary substantially among radiologists, manual segmentations by multiple radiologists at multiple time points are required to minimize intra-and inter-rater variability. In addition, intra-and interclass correlation coefficients and coefficients of variation are often calculated to evaluate the robustness and reproducibility of the extracted features (12,14,17,22,30,31,34,36,39,41).
In contrast to manual segmentation, semiautomatic segmentation usually comprises two steps. First, several labeling points are marked by radiologists. Thereafter, the entire ROIs are generated automatically by computing devices, based on the labeling points. Satisfactory gastric lesion segmentation performance has been achieved using this approach (25,30,38,39). All the four studies using semiautomatic segmentation employed the same software package (Frontier, Syngo via, Siemens healthcare), which applies a dichotomic classification algorithm to semiautomatically segment lesions from perinormal areas.

Feature Extraction
After lesion segmentation, quantitative handcrafted engineer features can be calculated to profile the intrinsic characteristics of the ROI. Handcrafted engineer features can be categorized as first-order statistics, shape-based, or texture-based features. First-order statistics are used to describe the distribution of pixel/voxel intensities in the ROIs, shape-based features show the geometric properties of the ROIs, and texture-based features are gray level matrices that represent textural patterns in an image region. Commonly used manual engineered features are presented in Table 2.
As opposed to handcrafted features, DL features are derived directly from the artificial neural networks, which encode medical images into a series of feature maps to extract features that represent high-dimensional information that cannot be detected by human readers. Using this method, Yuan Gao et al. achieved a mean average precision value and AUC of 0.7801 and 0.9541 in predicting perigastric lymph node metastasis, based on faster region-based CNN (24).
Handcrafted features describe the morphology, intensity, and textural patterns of ROIs, whereas deep learning network can automatically learn non-handcrafted feature representations from sample images.

Feature Selection
Most commonly used feature selection methods are categorized into the filter, wrapper, or embedded methods. Among these approaches, filter-based methods (e.g., correlation analysis, analysis of variance) are the simplest methods and select features according to a mutual information criterion (12,14,42,55). Wrappers (e.g., recursive feature elimination, sequential feature selection algorithms, and genetic algorithms) extract useful features based on classifier performance. Filters and wrappers are frequently combined to improve feature selection ability. Using Pearson correlation analysis and the sequential forward floating selection algorithm, Jing Yang et al. obtained optimal tumor and nodal hand-crafted radiomics features to construct a model, which demonstrated good predictive performance for GC metastasis (42). Embedded methods perform variable selection during the model training process. The least absolute shrinkage and selection operator (LASSO) is a classical and widely applied embedded method (11,19,25,27,31,33,34,36,45). Unlike the aforementioned methods, LASSO regression adds a penalty against complexity, which can enable the construction of a simple, yet effective model with a small number of features.

Model Construction
Regarding modeling strategy, logistic regression models (e.g., multivariate logistic regression, LASSO regression) have been widely used in AI-based GC studies. Random forest and support vector machines (SVM) are also effective alternatives for model construction (19,28,32,36,43). In a multicenter study, Di Dong et al. proposed an AI model that integrated DL, hand-crafted radiomics, and clinical factors. Their model used various modeling methods, including SVM, artificial neural networks, random forest, Spearman's correlation analysis, logistic regression analysis, and linear regression analysis, and demonstrated good predictive performance for lymph node metastasis in locally advanced GC (32).
The above workflow and key methodologies of AI techniques in GC imaging are summarized in Figure 1.

CLINICAL APPLICATIONS OF HAND-CRAFTED RADIOMICS AND DEEP LEARNING IN GASTRIC CANCER
Major clinical applications of AI in GC research are shown in Figure 2.

Characterization
The TNM classification is the most widely used staging system in GC, and pretreatment CT/MRI is vital for making optimal treatment decisions (56,57). Considering its widespread application, most hand-crafted radiomics and DL studies have utilized CT images for preoperative prediction of TNM stages (24, 27, 28, 31, 32, 36-38, 40-42, 51, 52). Precise pretreatment TNM staging of lymph node metastasis is plagued by major obstacles because of discrepancies in traditional imaging features, such as shape, size, and enhancement patterns. Therefore, many researchers have been developing AI-based models to accurately predict lymph node status in GC patients (24, 27, 28, 31, 32, 36-38, 41, 42). While Yang et al.'s study combined tumor and nodal hand-crafted radiomics features (42), other studies selected only the tumor for the ROI (24,27,28,31,32,(36)(37)(38)41). Of the 10 studies focusing on lymph node status, seven were designed to discriminate between N+ and N- (28, 31, 36-38, 41, 42), two to discriminate specific N stages (N0-3) (27,32), and one with ambiguous lymph node status (24). Models based on handcrafted radiomics, DL, or the combination of the two have shown AUC and C-indices of 0.79-0.95 in the training and 0.76-0.89 in the validation cohorts, respectively (24, 27, 28, 31, 32, 36-38, 41, 42). Three studies tested model efficacies for T stage prediction, where two aimed to discriminate T1/2 from T3/4 (29,30,41), and one to classify all T1-4 stages (25), with all yielding good discriminatory performance with AUCs ranging from 0.82 to 0.90. Liu et al. investigated venous CT images of primary tumors in advanced GC and built a hand-crafted radiomics model to predict occult peritoneal metastasis (40). Because of the popularity of CT, MRI has been used less frequently in GC patients, with only four studies focused on MRI-based prediction of TNM staging (12,16,17,26). Using hand-crafted radiomics analysis, the authors found that diffusion-weighted imaging and apparent diffusion coefficient maps demonstrate potential in preoperative T and N staging for GC. Using histopathological results as a reference, six studies explored the correlation between AI-based models and prognosis-related factors of tumor differentiation grade (9,14,15,25), Lauren classification (14,39,47), and lymphovascular and neural invasion (14,17,25,34). Two studies were based on MRI images (15,17) and four were on CT images (14,25,34,39), and all models exhibited good predictive ability for GC before operation. In addition, researchers carrying out immunohistochemistry studies have developed hand-crafted radiomics models to predict human epidermal receptor 2 status, which could serve as a noninvasive prediction tool for GC for selecting candidates suitable for Herceptin (10,35). Furthermore, Gao's hand-crafted radiomics model showed good performance in estimating tumor-infiltrating regulatory T (TITreg) cells, with AUCs of 0.85-0.88 in various cohorts (33).

Differential Diagnosis
Five studies were conducted to differentiate between different gastric tumors (9,11,(48)(49)(50). By applying texture analysis, Ba-Ssalamah et al. classified adenocarcinomas, lymphomas, and gastrointestinal stromal tumors from artery and portal venous CT images, respectively, and misclassification rates ranged from 0%-10% (9). Ma, Feng and Sun et al. focused specifically on differentiating Borrmann type IV GC from primary gastric lymphoma. By combining hand-crafted radiomics signatures, subjective CT findings, age, and gender, Ma's model achieved a diagnostic accuracy of 87.1% (11). All these models demonstrated potential for accurate gastric tumor discrimination.

Treatment Response and Prognosis
Neoadjuvant chemotherapy (NAC) can decrease tumor size and reduce mortality (58) and is recommended for potentially resectable advanced GCs. However, response rates of NAC vary among studies (59). In patients who do not benefit from NAC, the delay in surgery can lead to tumor progression and poor prognosis. Therefore, noninvasive selection of NAC responders before treatment is crucial for treating patients with advanced GCs. Three studies have utilized CT-based handcrafted radiomics analysis build models to predict non- responders, which have yielded AUCs of 0.65-0.82 (13,19,43). Notably, Sun et al. demonstrated that their hand-crafted radiomics model performed better for NAC response prediction compared with a clinical model (43). Chemotherapy and radiation therapy are two mainstays for advanced GCs. Three studies have been carried out to predict chemotherapy response (20,21,23). Jiang et al.'s model showed that higher scores of their CT-based hand-crafted radiomics signature indicated a favorable response to chemotherapy for stage II-III patients (20). Similarly, Jiang et al. built a Rad-score system based on hand-crafted radiomics features from PET images, where higher scores indicated chemotherapy responders (23). Klaassen et al. focused on individual liver metastases in esophagogastric cancers and developed a CTbased hand-crafted radiomics model to predict responsive lesions; the resulting AUCs ranged between 0.65-0.87 in various cohorts (21). Only one study tested model efficacy for radiotherapy responders in GC patients with abdominal cavity metastasis. Based on pretreatment CT images, Hou et al. constructed two prediction models with high accuracies ranging from 0.71 to 0.82 (22).

FUTURE CHALLENGES AND OPPORTUNITIES
To date, numerous studies have demonstrated the prediction potential of hand-crafted radiomics and DL in GC characterization, differential diagnosis, treatment response, and prognosis. Despite the frequent application of MRI in clinical practice, it is not routinely recommended for GC evaluation. Most studies have focused on CT images and few have used MRI images. Considering its excellent resolution of soft tissue, MRI images may reveal more intrinsic tumor features and improve prediction. Therefore, future investigations should aim to include more patients undergoing MRI examinations for GC evaluation. Lymph node metastasis status is a key component of pretreatment and postoperative evaluation, and many studies have developed methods for pretreatment AI-based prediction, which include prediction of the existence of lymph node metastasis and N stage. However, there have not been any studies that have focused on individual lymph nodes, which is fundamental for precise pretreatment N stage evaluation and treatment plan modification during follow-up. We encourage future studies to focus on individual lymph node metastasis status prediction based on rigid pathological correlations. Moreover, few studies have analyzed the relationship between imaging features and treatment response. There is still a considerable knowledge gap in this field; further research is needed to improve patient selection and develop better treatment plans.
In addition, future efforts should continue to be actively pursued regarding the methodologies of AI. More intensive and standardized quality controls throughout the entire workflow of AI are warranted to meet this requirement. By analyzing a total of 77 hand-crafted radiomics-based oncology researches, Park et al. reported insufficient overall scientific quality of current hand-crafted radiomics studies (61). Similar dilemmas arose at every stage of GC from data acquisition, segmentation, feature extraction, feature selection, model construction to model performance reporting. In this context, compliance with widely-accepted quality systems [e.g. Handcrafted radiomics Quality Score (RQS) (62), Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) (63), etc.] may offer appeal. In addition, prospective multi-institutional collaborations to establish well-curated databases and networks are encouraged in future studies. Furthermore, considering the inherent capacity of AI in analyzing parallel streams of information, including clinical and genomics characteristics (64)(65)(66), multi-omics studies which integrate these data may pave the way for better personalized and precision medicine. Collectively, we hope the fruit of these efforts could help to shift the landscape of AI in GC from exploratory research settings to routine clinical settings.

CONCLUSION
GC has a high incidence and mortality rate, which have been the clinical research emphasis over the past decades. Hand-crafted radiomics and DL are emerging quantitative subsets of AI that have been widely utilized in medicine. The exploration of GC using hand-crafted radiomics and DL has led to promising results for every step of the clinical pathway. However, most studies have been retrospective, conducted in a single center, and analyzed using a single image modality, which have limited the utility of the constructed AI models. Therefore, further prospective and multicenter studies are needed to validate the models. Moreover, other imaging modalities, such as endoscopic ultrasound may be integrated into the models to further improve model efficacy.

AUTHOR CONTRIBUTIONS
BS and NH designed and supervised this study. YQ, YD, and HJ conducted the literature search, article selection, data extraction, data analyses and data interpretation. YQ and YD contributed to the conception of the study and drafted the manuscript. All authors contributed to writing of the manuscript and approved the final manuscript. YQ and YD contributed equally to this work. All authors contributed to the article and approved the submitted version.