Predicting Microsatellite Instability Status in Colorectal Cancer Based on Triphasic Enhanced Computed Tomography Radiomics Signatures: A Multicenter Study

Background This study aimed to develop and validate a computed tomography (CT)-based radiomics model to predict microsatellite instability (MSI) status in colorectal cancer patients and to identify the radiomics signature with the most robust and high performance from one of the three phases of triphasic enhanced CT. Methods In total, 502 colorectal cancer patients with preoperative contrast-enhanced CT images and available MSI status (441 in the training cohort and 61 in the external validation cohort) were enrolled from two centers in our retrospective study. Radiomics features of the entire primary tumor were extracted from arterial-, delayed-, and venous-phase CT images. The least absolute shrinkage and selection operator method was used to retain the features closely associated with MSI status. Radiomics, clinical, and combined Clinical Radiomics models were built to predict MSI status. Model performance was evaluated by receiver operating characteristic curve analysis. Results Thirty-two radiomics features showed significant correlation with MSI status. Delayed-phase models showed superior predictive performance compared to arterial- or venous-phase models. Additionally, age, location, and carcinoembryonic antigen were considered useful predictors of MSI status. The Clinical Radiomics nomogram that incorporated both clinical risk factors and radiomics parameters showed excellent performance, with an AUC, accuracy, and sensitivity of 0.898, 0.837, and 0.821 in the training cohort and 0.964, 0.918, and 1.000 in the validation cohort, respectively. Conclusions The proposed CT-based radiomics signature has excellent performance in predicting MSI status and could potentially guide individualized therapy.

Background: This study aimed to develop and validate a computed tomography (CT)based radiomics model to predict microsatellite instability (MSI) status in colorectal cancer patients and to identify the radiomics signature with the most robust and high performance from one of the three phases of triphasic enhanced CT.
Methods: In total, 502 colorectal cancer patients with preoperative contrast-enhanced CT images and available MSI status (441 in the training cohort and 61 in the external validation cohort) were enrolled from two centers in our retrospective study. Radiomics features of the entire primary tumor were extracted from arterial-, delayed-, and venousphase CT images. The least absolute shrinkage and selection operator method was used to retain the features closely associated with MSI status. Radiomics, clinical, and combined Clinical Radiomics models were built to predict MSI status. Model performance was evaluated by receiver operating characteristic curve analysis.
Results: Thirty-two radiomics features showed significant correlation with MSI status. Delayed-phase models showed superior predictive performance compared to arterial-or venous-phase models. Additionally, age, location, and carcinoembryonic antigen were considered useful predictors of MSI status. The Clinical Radiomics nomogram that incorporated both clinical risk factors and radiomics parameters showed excellent performance, with an AUC, accuracy, and sensitivity of 0.898, 0.837, and 0.821 in the training cohort and 0.964, 0.918, and 1.000 in the validation cohort, respectively.

INTRODUCTION
Colorectal cancer (CRC) is characterized by complex biological features and shows distinct heterogeneity. Even though the clinicopathological characteristics of CRC are similar, there is still significant variability in treatment response and prognosis (1). Two major molecular events are involved in the occurrence and development of CRC (2,3). The vast majority of CRCs are caused by chromosomal instability events (approximately 85%), including mutations in APC, KRAS, and TP53 genes, etc. However, a small percentage of CRCs are caused by microsatellite instability (MSI) (approximately 15%).
Mismatch repair (MMR) genes are highly conserved and are involved in repairing DNA base mismatches. They are beneficial in maintaining genome stability and reducing spontaneous mutations (4). MMR proteins include MLH1, MSH2, MSH6, and PMS2. During DNA replication, minor DNA mismatches occasionally occur, which are recognized by these proteins and then cut and synthesized into new strands for repair (5,6). When any one of these four proteins are non-functional, they cause accumulation of DNA base mismatches in proliferating cells, a phenomenon known as MSI (6).
MSI status is currently a key predictor for evaluating the treatment strategies and prognosis of CRC patients (7,8). Compared with microsatellite-stable (MSS) CRC patients, CRC patients with MSI status are more likely to benefit from immunotherapy, but they do not benefit from pyrimidine analogs or fluorouracil-based adjuvant chemotherapy (9)(10)(11). In addition, CRC patients with MSI status may have a favorable prognosis (12)(13)(14). The National Comprehensive Cancer Network (NCCN), European Society for Medical Oncology (ESMO), and Japanese Society for Cancer of the Colon and Rectum (JSCCR) guidelines recommend testing the MSI status of CRC patients (4,15,16).
At present, MSI status detection is mainly done through immunohistochemistry (IHC) and polymerase chain reaction (PCR) methods on biopsy or surgical tissue, both of which are invasive and costly (8,17). Furthermore, the small part of the tissue captured by biopsy may not be sufficient to accurately reflect the MSI status of tumors (18,19). In addition, these advanced biological tests can only be performed in qualified tertiary medical centers, as local medical institutions have not widely adopted these methods because of the lack of suitable medical equipment (20). Therefore, developing a non-invasive, cost-effective, and easily repeatable method to identify MSI status could help clinicians to develop more accurate treatment strategies for CRC patients.
Radiomics is a burgeoning field in the era of precision medicine, involving screening, diagnosis, treatment, and prognostic assessment of multiple systemic diseases (21)(22)(23)(24). By extracting high-dimensional, mineable data from medical imaging and evaluating its association with clinicopathologic factors or gene expression, radiomics facilitates the formulation of individualized treatment strategies. Radiomics has been widely used in CRC stage assessment (21), tumor differentiation identification (25), post-neoadjuvant chemotherapy efficacy evaluation (26), and KRAS mutation status identification (27). A previous study demonstrated a significant correlation between a CT-based radiomics signature and MSI status in CRC patients (28,29). These results indicate that pretreatment CT may be associated with MSI status and that radiomics analysis may greatly contribute to MSI status identification. However, previous studies have only included a single group and lack external validation. Moreover, the superiority of the venous phase (VP) compared to arterial and delayed phases (AP and DP, respectively) in the prediction of MSI status in CRC patients remains to be confirmed. Therefore, the aim of this study was to investigate whether a CT-based radiomics signature could identify MSI status in CRC patients and to identify the phase with the most robust and high-performing radiomics signature from triphasic enhanced CT.

Patients
Ethical approval was obtained by the medical ethics committee in both participating centers (center I: Lanzhou University Second Hospital; center II: The Second People's Hospital of Lanzhou city), and the requirement for informed consent was waived due to the retrospective nature of the study. Patient inclusion and exclusion details and the patient recruitment pathway are shown in Figure 1. The institutional database in center I was searched for eligible patients who underwent curative resection between March 2014 and August 2020, and a total of 441 consecutive patients were enrolled. This study included 255 males (42.2%) and 186 females (57.8%), with an average age of 58.64 ± 12.92 years (range, 20-89 years). Furthermore, 61 patients from center II were also enrolled between July 2018 and August 2020, including 38 males (62.3%) and 23 females (37.7%), with an average age of 56.93 ± 11.94 years (range, 27-84 years). The model for MSI prediction was established in the training cohort and evaluated in the external validation cohort. The baseline clinical data of all CRC patients, including age, sex, tumor location, carcinoembryonic antigen (CEA) level, CA125 level, and CA199 level, were collected. Two radiologists (radiologist A, Y.T.C.; radiologist B, J.Z.) with more than 10 years of experience in abdominal imaging collected radiological features on preoperative CT images and recorded the results, including clinical tumor/lymph node (cT/N) stage and tumor maximum diameter (maximum diameter perpendicular to the long axis of the cross-sectional image). In order to minimize the deviation of the measurement results, the quantitative data was taken as the final result by the average of the measurement values of the two radiologists, while the qualitative data is diagnosed by the two radiologists independently, and the disagreement is resolved through negotiation.

Identification of MSI Status
MSI status was evaluated by immunohistochemical staining of MMR proteins (MLH1, MSH2, MSH6, PMS2). The standard streptavidin biotin-peroxidase procedure was performed on postoperative tissues to identify the MSI status. Patients were classified into the MSI or MSS group according to the staining results of MMR proteins. Among the four MMR proteins, negative staining for one or more proteins was defined as MSI. MSS was defined as positive staining for all four MMR proteins (6).

CT Image Acquisition and Segmentation
All patients underwent abdominal and/or pelvic enhanced CT scans in two institutions, which covered the whole tumor. Triphasic enhanced CT images were retrieved from the picture archiving and communication system (PACS, Carestream; Rochester, NY) and stored in corresponding folders in DICOM format for further analysis. The CT scanner and acquisition parameters of the three institutions are listed in Supplementary Tables S1, S2.
Two gastrointestinal radiologists (radiologist A and radiologist B) performed three-dimensional (3D) radiomics segmentation on AP, VP, and DP using ITK-SNAP software (version 3.6.0; www.itksnap.org). Radiologist A segmented 300 cases and radiologist B segmented the other 202 cases.
For radiomics segmentation, an ROI was manually delineated on each slice of the tumor. Air and feces in the intestinal tract and pericolonic fat were carefully excluded from the contours. Finally, three ROIs (AP, DP, and VP) were generated for each patient. To evaluate inter-observer reproducibility and robustness of feature extraction, radiologist A and radiologist B randomly selected 30 patients and performed manual segmentation again. We estimated the reproducibility of feature extraction using inter-class correlation coefficients (ICCs), where ICCs greater than 0.80 indicated good reproducibility (30). Additionally, 30 patients were randomly selected from each CT scanner to build the CT scanner set for calculating intra-and interclass correlation coefficients (ICCs).

Feature Extraction
Before feature extraction, we adopted three steps to preprocess the CT images. First, we resampled images to 1 mm × 1 mm × 1 mm using linear interpolation to try to reduce the influence of different layer thicknesses. Second, we transformed the continuous images into discrete integer values using gray-level discretization processing (bin width = 25). Finally, Laplacian of Gaussian (LoG) and wavelet image filters were used to eliminate mixed noise in the processing of image digitization and to obtain low-or high-frequency features. Radiomics features were extracted using the PyRadiomics package (31). Seven classes of radiomics features were extracted from the original and filtered images (wavelet and LoG). Finally, 1037 3D radiomics features were extracted from each patient. The feature types and their numbers are as follows: (1) first-order (histogram) features (n = 198); (2) shape features (n = 14); (3) gray-level co-occurrence matrix (GLCM) features (n = 264); (4) gray-level run-length matrix (GLRLM) features (n = 176); (5) gray-level size zone matrix (GLSZM) features (n = 176); (6) neighboring gray-tone difference matrix (NGTDM) features (n = 55); (7) gray-level dependence matrix (GLDM) features (n = 154). The specific definitions and descriptions of the features are demonstrated in the Supplementary Materials.

Features Selection and Prediction Model Building
After radiomics feature extraction, all missing data in the training cohort were replaced by median value, z-score normalization was performed on each feature, and the same preprocessing procedure was applied to the validation cohort. We performed a binary classification task for MSI status prediction: MSS vs. MSI expression. It is worth noting that the sample numbers of the two groups were unbalanced between the training and validation cohorts. The initial bias adjustment method was used to correct the influence of unbalanced sample size. The adjustment bias b 0 was determined using the following equation: The process of radiomics feature selection that is most related to MSI status consists of three steps. First, univariate analysis with the Mann-Whitney U test was performed for feature selection to retain features with p < 0.05 for the subsequent process. Second, the least absolute shrinkage and selection operator (LASSO) method was used to retain features closely associated with MSI status. Finally, multivariable stepwise logistic regression further eliminated irrelevant features and retained the most informative features. A ten times five-fold cross-validation method was applied to avoid overfitting and to identify the model with the best performance.
Three radiomics models were established based on the above radiomics signatures in triphasic phase-enhanced CT images (APR, VPR, and DPR models). In order to verify whether the model combining the triphasic enhanced phases can improve the prediction performance of MSI status, the FR model was built based on AP, VP, and DP fusion features from 3D segmentation patterns. The maximum area under the curve (AUC) in the training cohort determined the final regularization parameter. Furthermore, the Radiomics models predicted a radiomics signature demonstrating the likelihood of MSI status for each patient.

Clinical, Combined Model, and Nomogram Construction
For clinical and radiological features, the chi-squared test or Fisher's exact test was used to compare differences in sex, CEA, CA125, CA199, cT stage and cN stage, while the Student's t-test or Mann-Whitney U test was used to compare differences in age, and maximum diameter between the MSS and MSI groups in the training and external validation cohorts. Generally, P-values < 0.05 (two-sided) were considered statistically significant. We performed multivariable analyses to identify the most important features. A clinical model was established based on the inclusion of selected features.
A combined model (clinical Radiomics) was developed based on correlated clinicalradiological features and radiomics features to verify whether the combination of radiomics signatures and clinicalradiological features could improve the prediction of MSI status, and it was presented as an individualized nomogram.
Using multivariate logistic regression coefficients, a nomogram incorporating clinicalradiological characteristics and radiomics signatures was created for the training and external validation cohorts following the selection of clinical characteristics and model comparison. This nomogram provides a more convenient and reliable tool for patients and clinicians. A flowchart of the study is shown in Figure 2.

Statistical Analyses
All statistical analyses were conducted using the R statistical software package (version 3.6.3; http://www.Rproject.org). Student's t-test, the Mann-Whitney U test, and the chi-squared test or Fisher's exact test were used to compare continuous and categorical variables, as appropriate. A two-sided P-value < 0.05 was considered statistically significant. ICCs were used to calculate the consistency of measurements between the two radiologists and different CT scanners. Receiver operating characteristic (ROC) analysis was used to evaluate the predictive accuracy of the different models. The AUC, 95% confidence interval (CI), accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated for each model. Precision-recall (PR) curves and the DeLong test were used to compare the AUC estimates of the discrimination performance between different predictive models. A calibration curve was constructed to assess the goodness-of-fit of the models. The Hosmer-Lemeshow (HL) test was performed to assess the agreement between the predicted MSI status and the observed outcomes. To verify the clinical usefulness of the models, we quantified the net benefit at different threshold probabilities in the dataset using decision curve analysis (DCA).

Clinical Characteristics
Patient and tumor characteristics in the training cohort are listed in Table 1. This study included 502 CRC patients (441 patients in center I, 61 patients in center II) in the final analysis. The prevalence of MSI was 15.19% (67/441) in center I and 14.75% (9/61) in center II.

Predictive Performance of the Clinical Model
Age, tumor location, CEA level, CA125 level, and maximum diameter were found to be significantly different (P < 0.05) between the MSI and MSS groups in the training cohort, but other characteristics were not significantly different (P > 0.05). Finally, after multivariate analyses, age, tumor location, and CEA were selected as independent predictors of MSI and were enrolled into the clinical model (Supplementary Table S3). The clinical model showed moderate performance in predicting MSI both in the training cohort and the validation cohort, with an AUC of 0.781 (95%CI, 0.722-0.840) in the training cohort and 0.919 (95%CI, 0.833-1.000) in the validation cohort ( Table 3). The accuracy, sensitivity, and specificity were 0.721, 0.716, and 0.722 in the training cohort and 0.869, 0.889, and 0.865 in the validation cohort, respectively.

Radiomics Signature Building and Discrimination Performance Assessment
ICCs were calculated to evaluate the agreement of features extracted by the two radiologists and different CT scanners, and ICC values > 0.80 indicated good agreement. A total of 1037 3D radiomics features from AP, VP, and DP images were extracted for each patient, and irrelevant features were removed as described earlier. Finally, 6 AP, 10 VP, and 16 DP 3D radiomics features were retained as the final signatures. The feature names and distributions are listed in Table 2. The values of these features were significantly different between the MSI and MSS groups. Following stepwise regression analysis, 16 features were selected after fusion of the radiomics features from AP, VP, and DP (FR model). Significant differences were found in these features between the MSI and MSS groups (Supplementary    Figure 2, the feature heatmaps show that the correlation between most of the features is below than 0.9, indicating that the final features are less collinear with each other. Four models were built based on the above radiomics signatures for preoperatively predicting MSI (APR, DPR, VPR, and FR models). The AUC, accuracy, sensitivity, specificity, PPV, and NPV for each model are listed in Table 3 and Figure 3. The DPR model had optimal predictive performance compared to APR or VPR in the training and validation cohorts (Figures 3A, B). In addition, the FR model had a higher predictive AUC than APR, DPR, or VPR in the training cohort. In the validation cohort, the FR model had a higher predictive AUC than APR or VPR in the training cohort but slightly lower than the AUC of the DPR model.

Predictive Performance of the Combined Model
As shown in Figure 4A, a Clinical Radiomics combined model was developed that incorporates clinical risk factors and DP radiomics signatures, which was presented as a quantitative nomogram. The Clinical Radiomics model showed excellent predictive ability for MSI status, with an AUC, accuracy, and sensitivity of 0.898, 0.837, and 0.821 in the training cohort and 0.964, 0.918, and 1.000 in the validation cohort, respectively. As shown in Table 3 and Figure 3, the Clinical Radiomics model had a better predictive AUC value than either the clinical model or the radiomics models in the training cohort and validation cohort.
The PR curves show that the Clinical Radiomics model had better MSI prediction performance than the clinical or radiomics models ( Figure 5). DeLong test results showed a significant difference between the AUC of the Clinical Radiomics model and of the APR, VPR, and clinical models in the training cohort ( Figure 6). The calibration curve of the nomogram showed favorable agreement between prediction and observation in predicting the risk of MSI (Figures 4B, C). The HL test yielded non-significant statistics in the training and validation cohorts, indicating goodness-of-fit in the models.
The DCA results for the clinical model, radiomics models, and combined nomogram are presented in Figures 4D, E. The nomogram achieved more clinical utility in predicting MSI than the clinical model or radiomics model alone. The DCA curve of the nomogram demonstrated that when the threshold probability of a patient or doctor ranged between 5% and 80%, the use of the nomogram added greater benefit for MSI prediction than the treat-all-patients scheme or the treat-none scheme in the training cohort.

DISCUSSION
In this study, we investigated the association between triphasic enhanced CT radiomics features and MSI status. Six, ten, and sixteen radiomics features showed significant correlation with MSI status in AP, DP, and VP, respectively. Four radiomics models (APR, DPR, VPR, and FR) were proposed using the above radiomics features in the training cohort to predict MSI status for patients with colorectal cancer, and we validated its performance in an external validation cohort from another center. Our study showed that the DPR model had a higher outstanding performance than the APR or VPR models in both the training and external validation cohorts. Meanwhile, the nomogram, based on DP radiomics features and clinical risk factors, showed excellent identification ability for MSI status in both training (AUC: 0.898, 95% CI 0.860-0.937) and external validation (AUC: 0.964, 95% CI 0.919-1.000) cohorts. Our nomogram may be useful for predicting the MSI status of CRC patients and, thus, has the potential to aid in the determination of therapeutic strategies. In common studies, the results of external validation cohort are lower than the training cohort due to overfitting. Our results show that the results of external validation cohort are slightly higher than the training cohort, and lack of overlap between the 95% CI of the accuracies between the training and the validation cohorts. Since our validation cohort is external data set, there is often some deviation in distribution between the two data sets due to geographical location and other factors, which may cause the model performance of the validation cohort higher than the training cohort. Validation on additional cohort is required to ensure the model's reproducible and generalizable.  In the present study, the incidence of MSI was 15.19% (67/ 441) in the training cohort and 14.75% (9/61) in the external validation cohort, which is consistent with previous literature (32,33). CRC patients with MSI have distinct prognoses and treatment strategies compared to patients with MSS tumors, including better prognosis and benefits from fluorouracil chemotherapy; moreover, MSI may be a negative marker for immunotherapy. Previous studies (28,29) have investigated the association between MSI and radiomics features. Fan et al. (28) used CT-based radiomics to predict the MSI status in 119 stage II CRC patients. The predictive AUC of the radiomics model (combination of clinical factors and radiomics features) was Our proposed clinicoradiomics combined model performed better than previous models in both training and external validation cohorts. Therefore, it may be a potential quantitative tool for individualized MSI prediction.
Due to the low incidence of MSI, the data distribution in this study was significantly unbalanced. The unbalanced distribution of data is a common problem in classification. Therefore, the bias adjustment method was used to overcome the training fit error in our study. A previous study used synthetic minority oversampling technique (SMOTE) methods (28) to resolve data imbalance. The SMOTE method is based on increasing the "artificial" sample to resolve the imbalance of the data set. However, this strategy is prone to model overfitting and is difficult to demonstrate validity.  In our study, 1037 quantitative features were extracted from CT images to build radiomics signatures. During the image preprocessing stage, LoG and wavelet filters (27) were applied to process the original image. Of the 1037 radiomics features, 6, 10, and 16 features were retained in AP, DP, and VP images, respectively, all of which demonstrated high correlations with MSI and were stable across multiple centers. To our surprise, the majority of radiomics features were LoG and wavelet filter features (26/32 in radiomics features) in the present study, which means that LoG and wavelet filters can improve the efficiency of capturing more phenotypic features related to MSI of CRC.
In the present study, the texture feature was the most frequent radiomics feature in triphasic enhanced CT signatures (4/6 in AP, 9/10 in VP, 13/16 in DP). Texture features are microscopic features in an image that have been shown to be highly correlated with tumor heterogeneity (34,35). However, these features are not easily identified by the human eye and cannot be interpreted as having a clear meaning (36). Our results showed that most texture features were associated with MSI status. Compared with the MSS group, the values of these features were significantly higher in the MSI group, which indicated more homogeneity in the ROI. Our finding is in line with those of previous reports (28,29) that texture features were also the most frequent radiomics features for MSI prediction. We observed that first-order statistic features including A_wavelet.LHL_firstorder_90Percentile, A_wavelet.LHL_firstorder_Skewness, V_wavelet.HLH_ firstorder_Mean, D_original_firstorder_Range, and D_wavelet. LLL_firstorder_Skewness were significantly associated with MSI status, which was consistent with the results of the studies by Fan et al. and Pernicka et al. (28,29) The results of their studies show that the MSI status is associated with kurtosis and intensity histograms.
Among the triphasic enhanced CT models for the prediction of the MSI status in the training cohort, the DPR model showed the highest performance, with an AUC value of 0.887, followed by 0.827 in the VPR model and 0.775 in the APR model. A similar trend was found in the validation cohort; the predictive AUCs of the DPR, VPR, and APR models were 0.953, 0.810, and 0.876, respectively. Although the VP is the most commonly used phase in gastrointestinal radiomics research, and previous radiomics features for MSI prediction were extracted from portal VP CT images. However, to date, this is the first study to develop a radiomics based model to predict the risk of MSI status in CRC patients based on triphasic enhanced CT with big data. To our surprise, the DPR model showed the best predictive performance in the training and validation cohorts. The triphasic enhanced phase images reflect the uptake and clearance of iodine over time in AP, VP, and DP (37). In AP, the contrast agent is mainly in the intervascular space, which results in focal mucosa enhancement. During VP and DP, the contrast agent is evenly distributed between the intervascular space and the extravascular space, leading to a well-proportioned enhancement (38). The degree of tumor enhancement in AP is positively correlated with the density of microvessels in the tumor, while in VP and DP, the degree of tumor enhancement is related to the content of contrast agent in the tumor interstitial space and vascular space. In addition, CRCs lack normal lymphatic drainage, and the contrast agent tends to remain in the tumor interstitial space for a longer time (38,39). Therefore, CRCs are significantly enhanced in AP, while VP and DP show continuous enhancement. Previous literature shows that the increase in structure in the enhanced image is proportional to iodine concentration (38). The high content and uniform distribution of contrast agents in tumors may be one of the reasons for the high diagnostic efficiency of the DPR model. This is exactly the same as the number of key features in our study. The numbers of radiomics signatures in DP, VP, and AP were 16, 10, and 6, respectively.
The dynamic changes of CRC from AP, VP to DP showed obvious transmural enhancement from inside the tumor to the outside. For triphasic enhanced CT, AP is mainly used for tumor detection and assessment of the tumor extent along the colorectal wall, VP is used for differentiating CRC from adjacent organs and evaluating lymph nodes, and DP is used to determine the depth of tumor invasion (40). Therefore, the range of tumors detected in DP is larger than that in VP or AP. This means that the ROI delineation range of the DP is the largest during the delineation of the tumor in triphasic enhanced CT images, which is consistent with our observations in the process of delineating tumor ROIs. A positive correlation between increased tumor range and increased diagnostic efficiency has been confirmed by previous studies (41). From the above description, another reason for the high predictive performance of the DPR model could be the large ROI range of tumors in DP images.
Age, location, and CEA were independent predictors of MSI status in the multivariate analysis. CRC patients with an MSI status have distinct clinical characteristics compared to those with MSS tumors, such as a predominance of right-sided colonic tumors, and early age. Our finding is consistent with the results of a previous study (28,29,42). CEA levels were significantly lower in the MSI group than in the MSS group, while CA125 was significantly higher in the MSI group than in the MSS group in the present study. A significant correlation between MSI status and the above clinical predictors suggests that genetic alterations may have independent influences on CRC development, thus resulting in distinct tumor biological behavior compared with that of MSS tumors. These parameters could be easily obtained and thus considered as novel approaches for predicting MSI status. Further studies are essential to validate our findings.
As for radiation dose, the average dose length product of triphasic enhanced scans was 1934.76 ± 147.18 mGy*cm, which is slightly higher than the diagnostic reference level for adults (1490 mGy*cm) published by China's National Health Industry standard (WS/T 637-2018) (43). Application of new techniques such as multi-model iterative reconstruction technology could effectively reduce the radiation dose in clinical practice (44).
Several limitations of our study should be noted. First, 501 patients were excluded because they did not meet the inclusion or exclusion criteria, which inevitably produced selection bias. Second, due to the irregular shape of some tumors, manual segmentation is time-consuming and may have observer variability. In future studies, automated segmentation may be a potential tool to resolve this problem. Third, in this study, we used different imaging instruments and acquisition parameters to complete CT scanning. The influence of different instruments and different parameters on radiomics features is obvious. Therefore, it is important to standardize scanning protocols in different instruments and different institutions.

CONCLUSION
In conclusion, we proposed and validated a CT-based radiomics model, incorporating clinical risk factors and radiomics parameters, which showed a relatively high diagnostic performance for the risk prediction of MSI in patients with CRC. This model may be a potential tool for preoperatively identifying the MSI status and can be used in individualized therapeutic strategy planning and prognostic prediction.

DATA AVAILABILITY STATEMENT
Data are available from the corresponding author upon reasonable request.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Lanzhou University Second Hospital medical ethics committee. The ethics committee waived the requirement of written informed consent for participation.

AUTHOR CONTRIBUTIONS
Conception and design: JZho, HB, and YC. Collection and assembly of the data: YC and YY. Development of the methodology: JR. Data analysis and interpretation: All authors. Manuscript writing: All authors. All authors contributed to the article and approved the submitted version.