Machine learning models based on quantitative dynamic contrast-enhanced MRI parameters assess the expression levels of CD3+, CD4+, and CD8+ tumor-infiltrating lymphocytes in advanced gastric carcinoma

Objective To explore the effectiveness of machine learning classifiers based on dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) in predicting the expression levels of CD3+, CD4+, and CD8+ tumor-infiltrating lymphocytes (TILs) in patients with advanced gastric cancer (AGC). Methods This study investigated 103 patients with confirmed AGC through DCE-MRI and immunohistochemical staining. Immunohistochemical staining was used to evaluate CD3+, CD4+, and CD8+ T-cell expression. Utilizing Omni Kinetics software, radiomics features (Ktrans, Kep, and Ve) were extracted and underwent selection via variance threshold, SelectKBest, and LASSO methods. Logistic regression (LR), support vector machine (SVM), random forest (RF), and eXtreme Gradient Boosting (XGBoost) are the four classifiers used to build four machine learning (ML) models, and their performance was evaluated using 10-fold cross-validation. The model’s performance was evaluated and compared using the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, positive predictive value, and negative predictive value. Results In terms of CD3+, CD4+, and CD8+ T lymphocyte prediction models, the random forest model outperformed the other classifier models in terms of CD4+ and CD8+ T cell prediction, with AUCs of 0.913 and 0.970 on the training set and 0.904 and 0.908 on the validation set, respectively. In terms of CD3+ T cell prediction, the logistic regression model fared the best, with AUCs on the training and validation sets of 0.872 and 0.817, respectively. Conclusion Machine learning classifiers based on DCE-MRI have the potential to accurately predict CD3+, CD4+, and CD8+ tumor-infiltrating lymphocyte expression levels in patients with AGC.


Introduction
Although incidence and mortality have decreased in recent years, gastric cancer remains the fifth most common disease and the fourth leading cause of cancer death worldwide (1).The most common form of treatment for stomach cancer is still traditional surgical resection (2).Although only approximately 30% of stomach cancer patients are thought to be suitable candidates for radical resection, the alarming truth is that the great majority of patients receive a diagnosis when the disease has already progressed (3).
A major resurgence of hope has emerged on the horizon of advanced gastric cancer (AGC) treatment in recent years, ushered in by new immunotherapy research (4,5).The use of immunosuppressants targeting programmed cell death ligand 1 (PD-L1) and/or programmed cell death 1 (PD-1) in particular heralds an entirely new age of immunotherapy in cancer treatment (6).Immunotherapy, when paired with other treatments, has significantly boosted the survival rate of patients with gastric cancer (7).The level of T lymphocyte infiltration in the tumor microenvironment is crucial for tumor immunotherapy success (8,9).T lymphocytes are classified into several functional subsets, including other subtypes, such as helper (CD3 + CD4 + ) T cells and killer (CD3 + CD8 + ) T cells.The majority of T lymphocytes exhibit CD3, which is known as a biomarker for T lymphocytes with antitumor activity and is a significant prognostic indicator for overall survival and recurrence (10).The majority of antitumor effector cells are CD8 + T cells, and it has been established that CD8 + tumor-infiltrating lymphocytes (TILs) are crucial in anti-PD-1/PD-L1 therapy.A key component and predictor of the prognosis for gastric cancer is thought to be CD8 + TILs (11).The bulk of CD4 + T cells are helper T lymphocytes, which are crucial for tumor surveillance because they support CD8 + T-cell activation and proliferation as well as collaborate on antitumor actions (12).It is possible to more correctly forecast the trajectory of tumor development and the prognosis of patients by determining the presence of CD3 + , CD4 + , and CD8 + T cells in the tumor lesion area (13).Tissue samples are now needed to assess CD3 + , CD4 + , and CD8 + T-cell infiltration in malignant tumors, but acquiring these samples requires intrusive procedures such as surgical or puncture biopsies, which limits the capacity to provide a dynamic and comprehensive assessment of infiltration.Additionally, due to the heterogeneity of the malignancy, local samples are frequently not entirely typical of the whole tumor.Therefore, a noninvasive, repeatable approach to evaluate the infiltration of CD3 + , CD4 + , and CD8 + T cells in malignancies is urgently needed in clinical settings.
Radiomics is a rapidly expanding field that has shown significant promise in recent years.It shows enormous potential in a number of areas, such as disease diagnosis, tumor staging, protein expression detection, and prognosis prediction (14, 15).Radiomics has been shown to have considerable benefits in the treatment of stomach cancer (16).Recent research has shown that combining dynamic contrast-enhanced MRI (DCE-MRI) with radiomics analysis can produce promising findings in analyzing protein expression (17).On the one hand, radiomics can rapidly extract quantitative features from medical images, providing useful information for auxiliary diagnosis.DCE-MRI, on the other hand, not only provides deeper insight into blood vessel development and perfusion than other imaging techniques but also has superior spatial resolution and interobserver agreement of results (18).
The goal of this study was to determine whether the DCE-MRIbased noninvasive prediction model could predict the infiltration of CD3, CD4, and CD8 T-cell expression levels in advanced gastric cancer.Our findings could help in identifying patients who respond well to immunotherapy.

Patients
The ethics review boards of our hospitals granted their approval for this retrospective research, and the patient's informed consent was not needed.

MRI scanning
Before the MRI, all patients received the following preparation: (1) fasted for 8 hours to allow the gastrointestinal tract to empty.
(2) To suppress gastrointestinal motility, 10 mg anisodamine (Hangzhou Minsheng Pharmaceutical Co., LTD., China) was administered intramuscularly 10 minutes before the examination if there were no contraindications (e.g., glaucoma, asthma, or serious heart disease).(3) Patients were given 800-1000 mL of warm water orally 5 minutes before the exam to expand the stomach cavity.
For the MRI studies, a typical 12-channel phased-array body coil was employed in conjunction with a 3.0T MRI scanner (Verio, Siemens, Germany).The patient was lying supine during the examination, and the entire stomach was covered by the scanning field.Following a standard plain scan (T1-weighted image, T2weighted image), a DCE-MRI scan was needed for all patients.Free-breathing is employed during DCE-MRI scans, which are performed utilizing a three-dimensional, radial volumetric interpolated, breath-hold assessment approach.Initially, the following parameters were utilized for multiangle cross-sectional T1WI in the axial plane scan: repeat time: 3.25 ms; echo time: 1.17 ms; FOV: 350 × 284 mm; matrix: 288 × 164; layer thickness: 5 mm; scan at various flip angles (5°, 10°, and 15°) for 6.5 s each, for a total of 19.5 s.The next step employed multiphase dynamic enhanced scanning with the following parameters: the Flip angle was set to 10°, 35 phases were scanned, and the total scanning time was 227.5 s.All other parameters were left at their previous values.In phase 3, a gadolinium contrast material (Omniscan, GE Healthcare, China) was injected through the median elbow vein using a high-pressure injector.The injection dose and rate were set at 0.1 mmol/kg and 3.5 ml/s, respectively.To flush the region, 20 ml of saline was administered at the same flow rate.

Immunohistochemical staining and analysis
The expression of CD3 + , CD4 + , and CD8 + T cells in gastric cancer tissues was examined using immunohistochemistry (IHC).Pathological samples for gastric cancer were obtained through gastroscopic biopsy or surgery.All GC tissues that had been formalin-fixed and paraffin-coated were sliced into 4-mm-thick slices.Immunohistochemical staining was carried out using mouse anti-CD8 monoclonal antibody (1:200, GT211202, Gene Tech, Shanghai, China), rabbit anti-CD4 monoclonal antibody (1:200, GT219102, Gene Tech, Shanghai, China), or rabbit anti-CD3 monoclonal antibody (1:200, GT219001, Gene Tech, Shanghai, China).Overnight, the portions were kept in a 4°C refrigerator.The samples were then stained with a secondary Workflow of this study.Detailed information on inclusion and exclusion of study subjects.Imaging histologic analysis and histologic assessment were performed separately.Feature screening was performed to construct the imaging histology assessment model.
antibody (K5009, Dako, Beijing, China) and incubated at 37°C for 10 min.Hematoxylin was employed as a counterstain, and diaminobenzidine (DAB) was utilized to designate the antibody.Before being examined under a microscope, sections were made transparent, dried, and mounted.Two knowledgeable pathologists conducted a double-blind examination of the immunohistochemical results.A low-power microscope was used to examine the complete tissue field before five randomly chosen fields were examined using a high-power (X40) microscope (Figure 2).The tumor tissue and stroma surrounding it, as well as cancer cell nests, were all included in the counting field.Patients were divided into two groups based on the median after CD3 + , CD4 + , and CD8 + T-cell expression was evaluated based on the average number of positively stained cells, according to an earlier study (19).

Image data analysis and processing
We used Omni Kinetics (GE Healthcare, China) software to postprocess the DCE-MRI image data of all qualified AGC patients.
Regions of interest (ROI) labeling: T1-mapping multi-flip Angle (5°, 10°, and 15°) sequence and dynamic enhancement sequence scan images were imported into the OK software workstation for post-processing.A variable flip Angle method was used to convert the signal intensity to the omnipowerful scanning concentration, and the cross-section was used as the main measurement plane.The abdominal aorta was manually selected to obtain the artery input function type (AIF Type) for image post-processing.A nonlinear registration framework (free deformation algorithm) was used to correct artifacts due to body motion (e.g., breathing) between consecutive DCE-MRI scans.The hemodynamic model Tofts model was selected to calculate the pharmacokinetic perfusion parameters.The lesion was delineated in 3-5 layers, avoiding necrotic and healthy gastric tissue, and the lesion was integrated into a 3D-ROI for quantitative analysis and calculation (Figure 3).Two experienced radiologists (radiologist 1 with 5 years experience and radiologist 2 with 8 years experience), who were unaware of the clinical and pathological data of the patients, segmented the measurements and averaged three times.
Feature extraction: the pharmacokinetic parameters of the whole tumor were generated, and the Tofts model was used to calculate the pharmacokinetic parameters, including the transfer rate constant from plasma to extravascular extracellular space (Ktrans), the transfer rate constant from extravascular extracellular space back to plasma (Kep) and the volume fraction of extravascular extracellular space (Ve).The software then automatically extracted the pharmacokinetic parameter features of the whole tumor from the three perfusion maps, a total of 201 features.These features included five categories: first order, histogram, gray level co-occurrence matrix, Haralick, and runlength matrix.The specific operation interface of the Omni Kinetics software is shown in Supplementary Material.

Interobserver variability evaluation
30 patients were recruited at random to assess the consistency of radiomics feature extraction by various observers.Intraclass correlation coefficients (ICCs) were calculated for tumor segmentation performed separately by readers 1 and 2, one week apart.Intra-group consistency analysis was then done on the features outlined by reader 1, followed by inter-group consistency analysis on the same 30 patients' features delineated by readers 1 and 2. The reproducibility of radiomics characteristics retrieved from DCE-MRI was rated satisfactory, with both intraobserver and interobserver ICC values more than 0.75.These features, which showed good repeatability, were collected for further radiomics study.Representative immunohistochemical staining images of CD3, CD4, and CD8 cells in patients with advanced gastric cancer.

Feature selection
The average value of each extracted radiomics feature was subtracted, its standard deviation was divided by it (a process known as Z score normalization), and all of the original feature values were then transformed into feature values with a 0-1 normal distribution.All features that are extracted may not apply to a particular activity.Therefore, a critical step for achieving the most effective result is to screen out particular features that are most pertinent to this study.In this work, three strategies for dimension reduction were used to eliminate redundant features: the variance threshold, the single variable selection method, and the least absolute shrinkage and selection operator (LASSO) method.Features with less than 0.8 variance are first eliminated by the variance cutoff.A p-value is used to assess the link between features and classification outcomes in the SelectKBest method.The screening of all characteristics with a p-value less than 0.05 is possible using this univariate feature selection technique.L1 regularization is used in LASSO regression as the cost function, with a maximum of 1000 iterations, to eliminate weakly correlated features and ultimately produce the best feature selection.

Construction and validation of radiomics models
Because only 103 patients were enrolled, it was impossible to evaluate the robustness of our model using the conventional method of splitting the sample into training and validation groups.Using 10-fold cross-validation, our study evaluated the resilience of the prediction model.The training data were subjected to a 10-fold internal cross-validation.The training data were divided into ten subsets; one subset was used for validation, while the other nine subsets were used for training.The next 10 iterations followed.These data were used to train different classifier models, mainly including Logistic Regression(LR); Support Vector Machine (SVM); RandomForest (RF); and eXtreme Gradient Boosting (XGBoost).The accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and AUC of each classifier model in the training and test populations were calculated to assess prediction performance.

Statistical analyses
For statistical analysis and the creation of visualizations, GraphPad Prism 8.0, SPSS version 24.0, and R software version 4.0.2(primarily packages for glmnet, pROC, RMS, and rmda) were utilized.The use of "glmnet" was made of the LASSO approach.The R software's "calibrate" function from the "rms" package was used for calibration.Count data were compared using the chi-square test or Fisher's exact probability test.Using the Mann−Whitney U test, continuous variables were compared between groups.Interclass correlation coefficients (ICC) were used to analyze the consistency of texture features extracted from ROI between the two observers; ICC >0.75 indicated satisfactory agreement.A bilateral statistical analysis was conducted, and a p-value of 0.05 or lower was deemed statistically significant.

Characteristics of patients
An average age of 67.7 years (range, 33-88 years) was found among 103 people with advanced stomach cancer in this retrospective analysis, 77 men and 26 women.The training cohort and test cohorts were divided into two groups, one with high infiltration and the other with low infiltration, based on the levels of CD3, CD4, and CD8 infiltration.Figure 3 illustrates instances of the IHC analysis of CD3, CD4, and CD8 expression.122, 87, and 138, respectively, were the median CD3 + , CD4 + , and CD8 + TIL levels in the training group.Tables 1-3 contain information about the clinical traits of AGC patients in the three cohorts who had high or low levels of infiltration (CD3, CD4, and CD8).

Radiomics analysis
From the DCE-MRI data, 231 features in total were retrieved (67 features each from K trans , K ep , and V e ).Details of all texture parameters extracted are provided in the Supplementary Material.Then, using the variance thresholding approach (threshold = 0.8), SelectKBest, and LASSO regression algorithms, we screened 8, 8, and 7 variables to build predictive models for CD3, CD4, and CD8, respectively.These attributes were given weights based on the appropriate coefficients.The Rad-score of the high-expression group was greater than that of the low-expression group in both the training and testing datasets of CD3, CD4, and CD8 (P < 0.05) (Figure 4).Rad scores for each patient in the training and test sets are presented as bars (Figure 5).The Rad-score equation for predicting CD3, CD4, and CD8 was as follows:

Radiomics model development and evaluation
For the prediction models of CD3 + , CD4 + , and CD8 + T lymphocytes, we constructed and evaluated models using LR, RF, SVM, and XGBoost classifiers.The performance of the classifiers is presented in Table 4 and Figure 6.In the training set, the LR model performed best in predicting CD3 T cells, with high accuracy, sensitivity, specificity, and AUC.In the test set, the LR model for CD3 T cells showed an accuracy of 0.807, sensitivity of 0.813, specificity of 0.800, and AUC of 0.817 (Figures 6A, D; Table 4).For CD4 + and CD8 T + cells, the XGBoost model performed best in the training set, but the RF model showed superior performance in the test set, with higher accuracy and specificity.Specifically, the RF model for CD4 + T cells achieved an accuracy of 0.903, sensitivity of 0.875, specificity of 0.933, and AUC of 0.904 in the test set (Figures 6B, E; Table 4); while for CD8 + T cells, the RF model achieved an accuracy of 0.903, sensitivity of 0.813, specificity of 1.000, and AUC of 0.908 in the test set (Figures 6C, F; Table 4).Therefore, we selected the RF model as the best predictive model for CD4 + and CD8 + .These results indicate that the RF model performs well in predicting CD4 + and CD8 + T cells, while the LR model exhibits better performance in predicting CD3 + T cells.It is worth noting that the XGBoost model may suffer from overfitting, thus we chose random forest as the final predictive model.These findings demonstrate the potential of the developed models for the preoperative prediction of CD3, CD4, and CD8 expression levels in AGC patients.In this study, a noninvasive DCE-MRI-based radiomics model was established and validated to predict preoperative CD3 + , CD4 +, and CD8 + T-cell infiltration status in AGC patients.Our research findings underscore the potential of the DCE-MRI radiomics model in assessing CD3 + , CD4 + , and CD8 + T lymphocyte infiltration levels.This noninvasive assessment method holds significant implications, as it has the potential to assist clinical practitioners in identifying AGC patients who may benefit from immunotherapy, thus providing support for the development of personalized treatment strategies.
Specific biomarkers connected to prognosis and responses to chemotherapy and immunotherapy have been discovered using TME quantitative analysis of diverse cellular subpopulations (11,20).Previous research in GC has indicated that larger numbers of CD3 + , CD4 + , and CD8 + T cells within tumors are related to increased overall survival (21,22).These proteins are normally detected using samples obtained through biopsy or surgical resection, followed by immunohistochemistry examination.However, these analyses can only reflect a part of the tumor tissue and cannot account for the tumor's overall heterogeneity (23).Imaging, on the other hand, can offer a comprehensive assessment of the overall anatomical structure and functional properties of tumor tissue (24).Much earlier research has shown that radiomics may accurately predict the immune microenvironment in a variety of malignancies using various imaging modalities (25, 26).DCE-MRI technology was used in our study to build a predictive model.This approach varies from traditional MRI imaging in that it offers precise information about the tumor's structure and function, such as blood volume, vascular permeability, and the vascular network within the tumor (27).This detailed structural and functional investigation aids us in better understanding tumor biology.Previous research has demonstrated that DCE-MRI is capable of predicting the presence of tumor-infiltrating lymphocytes in malignant tumors (28, 29).However, no study has focused on determining the extent of CD3 + , CD4 + , and CD8 + T-cell infiltration in advanced gastric cancer.In this investigation, we created four ML models utilizing DCE-MRI data and assessed and compared their efficacy in quantifying the numbers of tumor-infiltrating T cells, including CD3, CD4, and CD8 subsets, in advanced gastric cancer patients.This research covers a previously unknown knowledge gap in this field.Radiomics scores in different cohorts of patients.In both the training (A-C) and test groups (D-F), patients with strong CD3, CD4, and CD8 cell infiltration had significantly higher radiomics scores than patients with low infiltration.This study employed a 10-fold cross-validation approach and trained four machine learning models using pharmacokinetic radiomic features extracted from DCE-MRI data.These models performed admirably in differentiating between different levels of CD3, CD4, and CD8 invasion.The performance evaluation of various machine learning classifiers in predicting tumorinfiltrating T cell levels, including CD3, CD4, and CD8 subpopulations, reveals insights into the effectiveness of these models for clinical applications.For CD3 prediction, LR and SVM classifiers demonstrated robust performance in the training cohort, achieving AUC values of 0.872 and 0.870, respectively.However, in the test cohort, LR exhibited superior performance with an AUC of 0.817, indicating its efficacy in predicting CD3 + T cell infiltration.Regarding CD4 prediction, the RF classifier emerged as the top performer with AUC values of 0.913 and 0.904 in the training and test cohorts, respectively.This highlights the capability of RF in accurately predicting CD4 + T cell infiltration levels in AGC patients.Similarly, for CD8 prediction, the RF classifier demonstrated excellent predictive ability with AUC values of 0.970 and 0.908 in the training and test cohorts, respectively.The RF model's high accuracy and specificity suggest its suitability for identifying CD8 + T cell infiltration in AGC patients.However, it is worth noting that the XGBoost classifier, while achieving competitive AUC values in the training cohorts for CD3, CD4, and CD8 predictions, exhibited lower performance in the test cohorts, indicating potential overfitting issues.Overall, our findings underscore the potential of the RF classifier as the preferred model for predicting T-cell infiltration levels in AGC based on DCE-MRI data.RF is a robust ensemble learning algorithm that leverages multiple decision trees to achieve high accuracy and incorporates feature selection during classification prediction (30).The robust performance of RF highlights its clinical relevance and utility in guiding treatment decisions and patient management strategies.Nevertheless, further validation in larger and more diverse patient cohorts is warranted to confirm the generalizability and reliability of the predictive models in real-world clinical settings.In our study, K ep features played a pivotal role in constructing our radiomic model.K ep reflects the rate at which the contrast agent returns from the extravascular-extracellular space (EES) to the vasculature, providing crucial insights into tumor vascular characteristics and the distribution of the contrast agent within tissues (31).Typically, K ep values in tumor tissues are higher because the vasculature network in malignant tumors tends to be more tortuous, irregular, and permeable, resulting in rapid ingress and egress of contrast agents within the tissue (32).Previous research has underscored the significance of K ep in predicting the biological characteristics of tumors, including the extent of immune cell infiltration (33).This is because tumor vascular permeability and blood flow are closely associated with immune cell infiltration within tumor tissues.Thus, the prominence of K ep features in our radiomic model is justified, as they furnish a profound understanding of the tumor vascular microenvironment, which is critical for comprehending the distribution and infiltration of immune cells within tumors.

Conclusion
In conclusion, this work demonstrates the utility of DCE-MRI radiomics analysis in distinguishing levels of CD3 + , CD4 + , and CD8 + T lymphocyte infiltration in pretreatment AGC patients.This discovery highlights magnetic resonance imaging's potential as a noninvasive diagnostic for predicting the expression of immunotherapy-related proteins.

FIGURE 3
FIGURE 3 Histograms of different imaging modalities and quantitative perfusion parameters in patients with advanced gastric cancer.(A) Axial T1-weighted images showed a mass with an irregular and thickened gastric wall.(B) ROIs were placed manually in axial T1-weighted images.(C) Outlining the target area for eventual fusion into a three-dimensional structure.(D) Volume transfer constant (K trans ) plot of the ROI.(E) The plot of the reverse reflux rate constant (K ep ) for the ROI.(F) The plot of extracellular extravascular volume fraction (V e ) of ROI.(G) Histogram of K trans values.(H) Histogram of K ep values.(I) Histogram of V e values.ROI, Region of interest.

5
FIGURE 5 Radiomics score (Rad-score) waterfall plots for CD3 (A, D), CD4 (B, E) and CD8 (C, F) cohorts.The Y-axis displays Rad-score values.Positive numbers represent high expression forecasts, whereas negative values represent low expression expectations.Correct predictions have red bars with negative values and blue bars with positive values, whereas incorrect predictions have blue bars with negative values and red bars with positive values.

6
FIGURE 6Evaluate the efficacy of different T cell expressions using the LR, RF, XGBoost, and SVM models.Receiver operating characteristic curves for biomarkers used to classify CD3 (A, D), CD4 (B, E), and CD8 (C, F) expression levels in the training and testing cohorts.LR, Logistic Regression; SVM, Support Vector Machine; RF, RandomForest; XGBoost, eXtreme Gradient Boosting.

TABLE 1
Relationship between CD3 and clinicopathologic features in patients with advanced gastric cancer.

TABLE 2
Relationship between CD4 and clinicopathologic features in patients with advanced gastric cancer.

TABLE 3
Relationship between CD8 and clinicopathologic features in patients with advanced gastric cancer.

TABLE 4
The performance of the radiomics model using LR, RF, XGBoost, and SVM classifiers for predicting the extent of CD3, CD4, and CD8 infiltration in each cohort.