A Pipeline for Predicting the Treatment Response of Neoadjuvant Chemoradiotherapy for Locally Advanced Rectal Cancer Using Single MRI Modality: Combining Deep Segmentation Network and Radiomics Analysis Based on “Suspicious Region”

Patients with locally advanced rectal cancer (LARC) who achieve a pathologic complete response (pCR) after neoadjuvant chemoradiotherapy (nCRT) typically have a good prognosis. An early and accurate prediction of the treatment response, i.e., whether a patient achieves pCR, could significantly help doctors make tailored plans for LARC patients. This study proposes a pipeline of pCR prediction using a combination of deep learning and radiomics analysis. Taking into consideration missing pre-nCRT magnetic resonance imaging (MRI), as well as aiming to improve the efficiency for clinical application, the pipeline only included a post-nCRT T2-weighted (T2-w) MRI. Unlike other studies that attempted to carefully find the region of interest (ROI) using a pre-nCRT MRI as a reference, we placed the ROI on a “suspicious region”, which is a continuous area that has a high possibility to contain a tumor or fibrosis as assessed by radiologists. A deep segmentation network, termed the two-stage rectum-aware U-Net (tsraU-Net), is designed to segment the ROI to substitute for a time-consuming manual delineation. This is followed by a radiomics analysis model based on the ROI to extract the hidden information and predict the pCR status. The data from a total of 275 patients were collected from two hospitals and partitioned into four datasets: Seg-T (N = 88) for training the tsraUNet, Rad-T (N = 107) for building the radiomics model, In-V (N = 46) for internal validation, and Ex-V (N = 34) for external validation. The proposed method achieved an area under the curve (AUC) of 0.829 (95% confidence interval [CI]: 0.821, 0.837) on In-V and 0.815 (95% CI, 0.801, 0.830) on Ex-V. The performance of the method was considerable and stable in two validation sets, indicating that the well-designed pipeline has the potential to be used in real clinical procedures.


INTRODUCTION
Colorectal cancer is currently still the third most common cancer and the second most fatal cancer in the world (1). Nearly 30% sufferers are rectal cancer patients (2), great numbers of which are in the locally advanced stage at initial diagnosis (3).
To date, for patients with locally advanced rectal cancer (LARC), neoadjuvant chemoradiotherapy (nCRT) followed by total mesorectal excision (TME) has been the standard clinical treatment (4)(5)(6). The purpose of nCRT is to improve the feasibility of surgical procedures for LARC and reduce the incidence of complications, as it not only improves the local tumor control rate but also exhibits less toxicity to the human body (7). Clinically, the pathological response of LARC patients after nCRT treatment has demonstrated obvious heterogeneity (8). For a large percentage of patients (approximately 70-80%), the tumor will have been found to be shrunken or down-staged, and some patients may even have complete regression. It has been reported that approximately 20% of patients, defined as pathologic complete response (pCR) patients, contain no residual surviving tumor cells after nCRT and surgery (9,10). These patients have a favorable long-term prognosis with superb local control and disease-free survival (11). Therefore, for pCR patients, the option of organ-saving treatment could be developed to replace surgery. However, currently, the only way to accurately diagnose pCR is to utilize a pathological diagnosis after TME surgery, which presents an insoluble dilemma (12)(13)(14). As a result, a prediction method before surgery would greatly assist doctors in evaluating the treatment effects of nCRT and construct a tailored plan for each patient.
In recent years, magnetic resonance imaging (MRI) has been widely used as a non-genetic and non-invasive diagnostic method to assess the tumor condition due to its superior soft-tissue contrast and high spatial resolution. The T2-weighted (T2-w) MRI is recognized as the most important modality for rectal cancer assessments (15). Some previous works have attempted to evaluate the response of nCRT on T2-w MRI using assessments of post-treatment T staging (ymrT), tumor regression grading (mrTRG), volume reduction post-treatment, and other characteristics (16). However, the results were dependent on the experience of doctors, thus introducing subjectivity. The development of more objective ways to extract information from the MRI and guide clinical diagnosis is required.
Radiomics is a mathematical technique that utilizes highthroughput extraction of shape, intensity, and texture features from images, and transforms this visual information into high dimensional features for quantitative analysis (17). Radiomics analysis can help obtain additional image information with reliability and objectivity that may be invisible in human assessments (18). Applying a radiomics analysis to predict the pCR on an MRI has drawn increasing attention. In many studies (19)(20)(21), researchers have used a pre-nCRT MRI to analyze the relationship of the radiomics features and the pCR status. The use of a pre-nCRT MRI could provide a clear tumor region for analysis. However, considering that the nCRT could affect the tumor and change its properties, the use of a pre-nCRT MRI is indirect and may not reflect the true condition of the patient after nCRT treatment.
Analysis of a post-nCRT MRI might be a more direct method. However, the problem still remains that the region of interest (ROI) delineation on a post-nCRT is much more difficult due to tumor recession and the appearance of the fibrosis region. In previous studies (22,23), researchers applied a pre-nCRT MRI to provide a reference of the primary tumor region or the treated region for a post-nCRT MRI. However, a pre-nCRT MRI is not always available in real clinical practice, as some patients may be diagnosed with LARC using proctoscopy, and some patients may be transferred from other hospitals without access to the two previously scanned MRIs. Currently, only a few of studies have utilized a post-nCRT MRI to predict the pCR status. The work of Horvat et al. (15) obtained considerable results by applying a radiomics analysis on a post-nCRT T2 and diffusion-weighted imaging (DWI) MRI; however, the ROI was still obtained due to a careful discussion by at least two experienced physicians. An accurate delineation might be difficult to obtain for less experienced physicians without the reference of a pre-nCRT MRI. Additionally, it is time-consuming and resource-wasting if each delineation requires at least two physicians in clinical practice. Inspired by this, the aim of this study is to explore a pipeline that only uses the information from a single post-nCRT T2 MRI combined with a new method to provide a fast and reliable ROI. Deep learning uses multiple layers as a portion of a broader family of machine learning methods and has been successfully applied to various medical tasks (24)(25)(26)(27)(28).
In this study, we introduce a deep learning model for ROI delineation. A novel two stage model, termed the two-stage rectum-aware U-Net (tsraU-Net), is proposed to replace human evaluation. The ROI should be feasible for a deep learning model to find and contain sufficient information relating to the pCR status; hence, it is defined on a continuous region having abnormal intensity signals on a T2-w MRI assessed by radiologists that has a high possibility to contain a tumor or fibrosis. It is considered a rougher region, as the further identification of a tumor, fibrosis, or other tissues like edema is not defined. The following analysis extracts a great number of radiomics features, including texture, first-order statistics, and shape, on the ROI and its wavelet decompositions to represent certain properties. Machine learning and statistical techniques are later applied to select the most representative features and construct a final model to predict the pCR status.

Patients
We included a total of 496 patients from multiple institutions who received nCRT treatment diagnosed with LARC. Patients were retrospectively enrolled from July 2011 to December 2018 from two hospitals (Guangdong Institute of Gastroenterology, Sixth Affiliated Hospital of Sun Yat-sen University and Yunnan Cancer Center). The inclusion criteria were as follows: (1) adenocarcinoma confirmed by pathologists (excluding mucinous adenocarcinoma); (2) tumor located within 15 cm from the edge of the anus; (3) received nCRT treatment and TME; (4) in the clinical stage of T3-4 or Npositive; (5) completed the restaging MRI; and (6) the restaging MRI was performed no more than 1 week before the TME. This clinical trial was approved by the Clinical Ethics Review Committee of Sixth Affiliated Hospital of Sun Yat-sen University (2020ZSLYEC-010).
Patients from the Sixth Affiliated Hospital of Sun Yat-sen University were separated into three groups. Dataset Seg-T consisted of patients from July 2011 to June 2013 to train deep networks for ROI segmentation; dataset Rad-T consisted of patients from June 2013 to May 2017 to build the radiomics model for predicting the pCR status; and dataset In-V consisted of patients from June 2017 to December 2018 for internal validation. In addition, patients from Yunnan Cancer Center (dataset Ex-V) were used as an external validation set. Patients were further selected according to the following exclusion criteria: (1) poor MRI quality caused by severe inflammatory effusion, intestinal adhesions, or bowel movements, and (2) (for dataset Rad-T, Val, E-Val) the absence of a postoperative pathological diagnosis. Figure 1 shows the flowchart of the patient selection process.
All of the patients' treatments were discussed by the multidisciplinary team (MDT). Patients were delivered the intensity-modulated radiotherapy treatment (IMRT), and a dose of 45 Gy for 25 fractions was delivered to the clinical target volume. Then, a boost dose of 5.4 Gy was delivered to the

Pathology Assessment of Response
In this study, the pCR diagnosis was confirmed by two pathologists with more than 12 years of experience. Following the recommendation of the NCCN Guidelines for rectal cancer (29), patients with no surviving tumor cells in the surgical pathological specimens were judged as pCR; otherwise, they were judged as non-pCR.

MRI Data Acquisition
All of the MRI images were scanned under a 1.5-Tesla MRI unit (30). Bowel preparation was not routinely used for most cases prior to the examination. However, some specific patients with relatively small tumors in the sagittal view were filled with some rectal gel, making it easier to identify tumors on the oblique axis. The MRIs in the datasets Seg-T, Rad-T, and In-V (Sixth Affiliated Hospital of Sun Yat-sen University) were acquired using GE OPTIMA MR360 with a 100 ms echo time, a 4000 ms repetition time, a 100 field of view, a 512 × 512 matrix, 0.4-0.5 pixel spacing, and 5 mm slice thickness. The MRIs in the dataset Ex-V (Yunnan Cancer Hospital) were acquired using Philips Ingenia with a 100 ms echo time, a 4000 ms repetition time, a 100 field of view, a 432 × 432 matrix, 0.4-0.5 pixel spacing, and 5 mm slice thickness.

Data Pre-Processing
As suggested by some researchers (31), we applied complex methods to the pre-process MRI to both improve the image quality and unify the geometric and intensity patterns with the aim to assure the success of our analysis. The steps included (1) all of the MRIs were resampled into 0.4 mm × 0.4 mm pixel spacing using bilinear interpolation; (2) the size of each image matrix was unified into 544 × 544 by cutting or padding the background; (3) the intensity of each patient was adjusted using BiasCorrection to remove any inhomogeneity; and (4) the intensity histogram of each patient was matched to one selected patient (as template) who was from Seg-T. All of the procedures were implemented using the open-source python package "SimpleITK" (32).

Suspicious Region Definition
The "suspicious region" in our study was defined as a continuous region containing 129 abnormal intensity signals compared to a normal rectal wall, which are highly suspected to be cancer or fibrosis according to clinical experience. Following the guidance (33), the abnormal signals may have presented as slightly high, low, or mixed intensities. By such definition, the exact cancer and fibrosis region was not further distinguished. Instead, we relied on the radiomics analysis to elicit the hidden properties of the cancer or fibrosis and predict the pCR. In particular, the "suspicious region" was delineated by radiologists on only a post-nCRT T2-w MRI. As the region was visible to human vision, we assumed it could be captured by deep learning as well. Therefore, the use of "suspicious region" was both sufficient and proper in the pipeline that combined a radiomics analysis and deep learning. Figure 2 provides some examples of the suspicious region.

Deep Learning-Based Segmentation
The tsraU-Net Model To provide a reliable ROI using deep learning for the radiomics analysis, the most important consideration was rectum localization. If the deep segmentation network misrecognized the rectum with other organs such as the colon, uterus, bladder, or prostate due to a morphology change or location shift of the rectum, it could still find a "suspicious region," but not related to the pCR at all, thereby making the radiomics analysis totally meaningless. To address such a problem, we proposed a twostage model, named the two-stage rectum-aware U-Net (tsraU-Net), which would first find the rectum region and then segment the ROI using the awareness of the rectum location. The overall framework of the tsraU-Net model is shown in Figure 3. U-Net, a deep segmentation network expressly designed for biomedical segmentation tasks (24), was applied as a base model in both of the two stages of the tsraU-Net. Further improvements were made in each stage according to the task. The detailed descriptions were organized as follows. First, we briefly introduced the base model, U-Net. Next, we provided comprehensive explanations of the improvements in the two stages of tsraU-Net. Finally, we described other adjustments of the base model.

The Original U-Net Model
The original U-Net is a fully convolution network (FCN) containing an encoder to extract features and a decoder to reassemble features. Typically, both of them have five convolution blocks, s.t. each block consists of two 3 × 3 convolutions and a rectified linear unit (ReLU) for activation. After going through the convolution block, the number of features (more specifically, channels) would double in the encoder part and halve in the decoder part symmetrically. Between each convolution block in the encoder, the max pooling operation is applied to reduce the image resolution. Oppositely, an up-sample operation is inserted into the neighboring convolution blocks in the decoder to increase the image resolution. To fully utilize high resolution information, high resolution features in the encoder are concatenated to the corresponding convolution blocks in the decoder. This is named a "skip connection." In addition, a 3 × 3 convolution is applied after the last convolution block in the decoder to combine the rest of the features and obtain the final segmentation result.

The First Stage
In the first stage, we designed a four-channel 2D U-Net aiming to guide the segmentation with a plentiful amount of information.
Knowing that rectal regions would maintain certain continuity between neighboring MRI slices, the input of our model contains not only the currently input MRI slice but also its previous slice and the next slice to help detect the contours of the rectum. If the rectal wall is unclear compared with the neighboring region on the current slice, the other two slices may provide extra information. Furthermore, for each patient, we roughly marked two to four points inside the rectum for localization. Initially, the first and last MRI slices were given localization points, with bilinear interpolation applied between them to give localization information to the middle layers. If the shape of the rectum was not regular, another one to two points will be given in the middle layers. The previous slice, the current slice, the next slice, and the position information were combined into four channels as the input of our first stage model. The output of the first stage was the region of the rectum.

The Second Stage
The second stage would use the currently inputted slice and the predicted rectal region in the first stage to find the "suspicious region." In this stage, we focused on strengthening the model awareness of the abnormal intensity signals and, hence, applied an "attention" mechanism. "Attention" was first introduced in the natural language process (NLP) tasks to encourage models to pay more attention to efficacious information and suppress irrelevant information. There are two types of attention, i.e., soft attention and hard attention. In this task, we used the soft attention mechanism for the model (34). This method would   update the propagated features from "skip connection" by pointwise multiplication with a weight matrix given by an attention gate. An attention gate is a block containing several 1 × 1 convolutions and activation functions. It uses both the propagated features and the features from the corresponding former decoder block as input. Figure 4 provides a detailed explanation of an attention gate. Specifically, we inserted four attention gates into the original U-Net model s.t. every "skip connection" was followed by an attention gate.

Adjustments on Base Model
In addition to the above methods, we also made some small adjustments to the original U-Net model: (1) the addition of image padding during convolution so that the image size would not change; (2) the addition of instance normalization after each convolution block to accelerate the convergence; and (3) the replacement of the ReLU activation function with the Leaky Rectified Linear Unit (Leaky ReLU) to prevent the vanishing gradient problem.

Loss Function
The loss function in both two stages is the Dice Loss, which is widely used in medical image segmentation. It is defined from the dice coefficient, which essentially measures the overlap of two sets. The dice coefficient has a range of 0-1, where 1 means complete overlap. It is defined as Equation 1: where A ∩ B is the intersection of sets A and B, | | represents the number of elements in the set. As for Dice Loss, it is simply defined by the following Equation 2.

Experiment Setup
To provide the gold standard of segmentation, two radiologists (one had 6 years of experience and one had 9 years of experience) reviewed the post-nCRT T2-w MRI of the Dataset Seg-T and InV (for validating the performance), and they jointly provide delineations of the rectum and the "suspicious region." When delineating the "suspicious region," the normal rectal wall should be avoided, and for some patients with rectal gel filling, the gel should also be avoided. Both radiologists were completely blinded to the histopathology information, as well as the pre-nCRT MRI of patients. Following the instructions in the guidance (33), they used the rectal wall as a reference to find abnormal signals. This work was performed via ITK-snap version 3.4.0 software (http://itk-snap.org). While training, the initial hyper-parameters were established identically in two stages. The optimizer we used was the Adaptive Moment Estimation (Adam) (35) with an initial learning rate a = 2 × 10 -4 , b = (0.9,0.999) and would decay 30% every 20 epochs. The maximum training epoch was set to 120, the early stopping method was applied to prevent overfitting, and the training would stop if performance on the minority set did not improve over 30 epochs. The model was implemented with PyTorch 1.8.1 (36) on a Nvidia Titan X GPU with 12 G of memory.

Radiomics Analysis
After the segmentation model was well-trained, it was directly applied to the Rad-T, In-V, and Ex-V to obtain the ROI. Then, the following procedures of the radiomics analysis were applied to build the pCR prediction model.

Feature Extraction
In order to extract useful information related to the pCR status, a large feature space was generated, which included features not only from the original image but also from its wavelet decomposition images. A total of 93 types of features were calculated on each original image and its four Harr wavelet subbands, i.e., HH, HL, LH, and LL. The 93 features include 18 intensity features and 75 texture features. In addition, nine shape features were extracted on the original image. Together, 474 features were generated from each MRI slice.
Concerning that each patient had a different number of MRI slices, we used the arithmetic mean, three quartiles (Q1, Q2, and Q3 points), and the standard deviation of the features extracted from all of the slices as representations. Therefore, each patient had 2370 features in total.
All of the features were extracted using the Python package "PyRadiomics" (37). As announced in its document, most FIGURE 4 | The detailed explanation of attention gates. Let x represents the "skip connect" features from encoder, g represents the features from corresponding former decoder layer, thenx is the updated of x. "Conv2D 1×1" represents a 2D convention with kernel size of 1×1. "ReLU" and "Sigmoid" are two different activation functions.

Feature Selection
Two feature selection steps were applied in our study to increase the robustness and avoid overfitting. First, we evaluated the discriminative power of each single feature by calculating the Harrell's concordance index (C-index) (38) between the features and the pCR status. The univariate Cox analysis is a commonly used procedure in survival analysis, and we adopted this method because it does not require the features to follow a normal distribution compared with a t-test. Let A = {a 1 , a 2 , …, a n } denote the features of patients, b present the pCR status of patients, for each feature a f ∈ A, the C-index can be computed with the Equation 3.
where a fi is the i-th value of a f , b i is the i-th value of b, and U (a < b) = 1 if a < b else 0.
By definition, C-index equals to 1 means the best discriminative power and C-index equals to 0.5 represents a theoretically result of random prediction. We calculate the maximum of C-index(a f , b) and C-index(-a f , b) as the predictive score of feature a f . After calculation, scores are sorted and features with lower score are excluded.
In the second step, the remaining features were put into the least absolute shrinkage and selection operator (LASSO) (39) for further selection. LASSO is a logistic regression model with L1 regularization as a penalty of the coefficients. It will encourage the regression use of sparse features. The objective function of LASSO is: where A is the matrix of radiomics features, x is the coefficient of each feature, b is the pCR status, and l is the regularization penalty coefficient. Due to the L1 regularization, the LASSO forces the sum of the absolute value of the regression coefficients to be less than a fixed value, minimizing the residual sum of the squares. Such an operation forces the certain coefficient to zero. After the LASSO regression, features with a coefficient of non-zero are retained. Here, l was determined using a grid search and 5-fold cross validation on 100 iterations between 0.01 and 0.2.

The pCR Status Prediction
In our study, the support vector machine (SVM) (40) was applied to predict the pCR status. As suggested in study (41), the radial basis function (RBF) kernel was used. The RBF kernel is defined as Equation 5: where a, b are two samples from dataset and g is a hyperparameter. g and the regularization coefficient C of SVM were determined also by grid search on 5-fold cross validation within set {1/16, 1/8, 1/4, 1/2, 1}. After choosing the best g and C, the SVM model was trained for the pCR prediction.

Performance Evaluation
Various evaluation metrics of suspicious region segmentation and pCR prediction are listed below. Three metrics-dice coefficient, sensitivity, and specificitywere applied to evaluate the performance of segmentation. The dice coefficient's definition has been given in As for the pCR status prediction, five metrics-the area under receiver operating characteristic (ROC) curve (AUC), accuracy, sensitivity, specificity, and the F-score-were applied for the evaluation. The F-score is a weighted harmonic mean that comprehensively considers sensitivity and specificity, which can be calculated as Equation 8.
Here, we included F 0.5 , F 1 , and F 1.5 in order to provide a multiple trade-off between specificity and sensitivity under different situations.

Clinical Characteristics
In our study, 241 of 424 patients from Guangdong Institute of Gastroenterology, Sixth Affiliated Hospital of Sun Yat-sen University and 34 of 72 patients from Yunnan Cancer Center met the inclusion criteria and did not meet the exclusion criteria. After selection, the number of patients in SegT, Rad-T, In-V, and Ex-V were 88, 107, 46, and 34, respectively. More clinical information of the Rad-T, In-V, and Ex-V groups is provided in Table 1. Statistical comparisons were performed for each clinical characteristic between the two response groups (pCR vs. non-pCR). There were no statistical differences between the pCR and non-pCR in sex, age, and the pre-CRT N stage in all three sets.

Segmentation Performance
When training the segmentation model, Seg-T was randomly separated into two sets with percentages of 70% and 30%. The s.t. of the majority set was used for updating the model, and the minority set was used for selecting the best network parameters.
After training, the model was validated on In-V, and the results are listed below.

The First Stage
The numerical results of stage one, segmentation of the rectum, are presented in Table 2. The results of the original U-Net are also provided for comparison. It can be seen that the 4channel U-Net had a significant improvement compared to the original U-Net on the rectum segmentation. Moreover, as the 4channel U-Net achieved (0.942, 0.965) 95% CI of the dice coefficient, it could be inferred that this network could provide a stable 300 and accurate rectum segmentation.
The visual results of the 4-channel U-Net and the original U-Net are also presented. Figure 5 shows eight typical cases of rectum segmentation. The 4-channel U-Net had successfully segmented the rectum regions in all of the cases, while the U-Net showed different defects. In case (A), the U-Net was able to find the rectum, but the morphology lacked accuracy. In case (B), because the U-Net had no position information, the prostate was mistakenly judged as the rectum. In case (C), the U-Net outputted a continuous region containing both the rectum and the uterus. In case (D), due to the unclear rectal wall, the U-Net produced a bad result with an undesirable shape. In case (E), the prediction of the U-Net was less regular compared with the 4channel U-Net. In case (G), the U-Net seriously undersegmented the rectum region. In case (H), the U-Net found two separated regions with similar sizes. Among those defects, (A) and (E) might be improved by post-processing methods, but for the rest, even postprocessing methods such as image dilation or the removal of the smaller region seems useless to obtain the correct rectum region. Thus, our design in the first stage has great importance for guaranteeing the success of the following analysis.

The Second Stage
Numerical results of stage two, segmentation of the "suspicious region," are presented in Table 3. For a fair comparison, the U-Net model in this stage was also given the rectum localization information from the first stage, and we intended to evaluate the use of the attention mechanism. From the results, the attention slightly improved the result. The dice coefficient and specificity may not be considerably high, but the sensitivity achieved nearly 0.8, indicating that the network could find the major portion of the "suspicious region." Figure 6 shows a visual display of the tsraU-Net, including the last attention maps, the final segmentation results, and the overlapping region compared with the gold standard. In some cases, such as (B), (D), and (F), the segmentation results are oversized. However, the morphology between them is still similar, and the segmentation results do not neglect most of the gold standard. Consequently, we believe the model is capable of providing the ROI with enough information for radiomics analysis. 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) T1 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) T2 1 (2.8%) 3 (4.2%) 0 (0%) 0 (0%)

Treatment Response Prediction
A total of 2370 features representing certain properties of the "suspicious region" predicted by tsraUNet were extracted from each patient. These features were progressively selected in the initial univariate analysis, and only approximately the top 2.5% features, which was 63 features, remained according to their predictive scores. The histogram of the predicted scores with the number of features is illustrated in Figure 7A. A distinct gap was found between the remaining features and the excluded features with a corresponding threshold of 0.622. The remaining features were then put into LASSO. After a grid search and cross-validation, the best l was 0.293. The grid search of l in the LASSO regression to minimize the residual mean square error (MSE) is visually provided in Figures 7B, C, which provides the coefficients of the features during the grid search. Ten features were finally chosen, and they are presented in Table 4. The detailed descriptions of these features can be found in the "PyRadiomics documentation." The remaining 10 features was used to build a SVM classifier. The hyperparameters were decided after grid search and crossvalidation: C = 1 and g = 0.125. After training, the SVM achieved 0.924 of the AUC on Rad-T, 0.829 on In-V, and 0.815 on Ex-V. More numerical results are displayed in Table 5. In addition, Figure 8 gives a visual display of the AUC and SVM scores [provided by the Python package "scikit-learn" (42)].
We further applied t-test on the three datasets to test the distribution of SVM score between real pCR and non-pCR patients. The p-values on Rad-T, In-V, and Ex-V are 1.26×10 -10 , 1.41×10 -3 , and 3.26×10 -3 , respectively, indicating that the SVM score between real pCR and non-pCR is from different distribution under significance level a = 0.05.
From the above results, we conclude that the radiomics model could extract information related to the pCR and predict its status with certain reliability. Furthermore, the results indicated that the "suspicious region" is capable to be the ROI in this research, which means a single post-nCRT T2-w MRI has the ability to predict the pCR status without the help of a pre-nCRT MRI or other post-nCRT modalities. The overall pipeline is provided in Figure 9.

DISCUSSION
In this study, we proposed a method for predicting the pCR status of LARC patients after nCRT that only requires the post-nCRT T2-w MRI and a few manual operations. We provided a  novel definition of the suspicious region and used it as the ROI to build the radiomics analysis. Furthermore, we designed a deep learning model for suspicious region segmentation that could greatly reduce the workload of radiologists. Our experimental results, 0.829/0.815 AUC on the internal/external validation set, prove the feasibility and stability of our method in pCR prediction, indicating that our method has great potential for providing assistance to doctors in clinical diagnosis. Some information was obtained from the 10 selected features. First of all, no shape feature remained after selection. The reason might come from the fact that all of the ROIs were provided by the deep networks, which had a homogeneous shape regardless of the pCR status. Furthermore, half of the selected features were wavelet features, which might imply that some valuable information was hidden in the frequency domain. Many recent studies have also highlighted the importance of wavelet features in radiomics analysis (43)(44)(45)(46). Finally, 8 of the 10 features were quantiles, and 1 was the standard deviation. This suggested that simply averaging the features from all of the slices per patient was insufficient compared with using multiple statistics. FIGURE 6 | Attention map and comparison of suspicious region segmentation between gold standard and tsraU-Net. The first row is the input of the network, the second row is the attention map provided by attention U-Net, the third row is the prediction of tsraU-Net, the fourth row is the gold standard, and the last row is the overlapped region between prediction and gold standard. To further address the thoughtful design of our study, we wanted to highlight the importance of independent training of the segmentation and radiomics model. As a model always tends to overfit more or less while training, if we use the same set to train the segmentation as well as the radiomics, the predicted ROI on the training set and validation set might have different distributions, and consequently affect the stability of the radiomics analysis. Due to the above consideration, we used a particular dataset, Seg-T, to train the segmentation model and choose the network parameters. We then applied the best model on Rad-T, In-V, and Ex-V to assure the independence between segmentation and radiomics.
Unlike other studies (47,48) that have utilized a two-sample t-test as the first step to select the features, we used univariate Cox analysis. In fact, prior to the feature selection, we examined whether each feature was normally distributed by calculating the skewness and kurtosis (49). A total of 34 features did not pass this simple normality test under a significance level a = 0.05. Therefore, t-test could not be applied to all features in our study. Consequently, we utilized the concordance index as a replacement because it did not limit the distribution of data.   There were some limitations in our study. However, this was only a preliminary exploration. In the future, we will attempt further improvements. For instance, we could collect more unified standard data for analysis, and at the same time carefully choose the year of patient recruitment to avoid data mixing. In addition, we could encourage radiologists to delineate more precisely so that our segmentation network could better learn the characteristics of the suspicious regions. Additionally,  our segmentation network could be adjusted and improved using techniques, such as combining different loss functions following previous works (50)(51)(52)(53) or adding clinical characteristics for a joint analysis. In addition, if we could obtain MRI with smaller slice thickness (≤1mm), we could consider building a 3D model and studying the 3D radiomics features that may contain richer information of the suspicious region. Finally, the motivation of this study was different than other related works. We wanted to explore the possibility of using a single post-nCRT T2 MRI for patients missing a pre-nCRT MRI or other modalities. In addition, we intended to improve the efficiency of the model and reduce the workload of doctors for clinical use. Our method has great potential to guide lessexperienced doctors, as it does not require manual delineation of the ROI. Moreover, as our model is less restricted regarding data requirements and the prediction of the pCR was easily obtained, it can be combined with other studies for joint decision-making.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by This clinical trial was approved by the Clinical Ethics Review Committee of the Sixth Affiliated Hospital of Sun Yat-sen University (2020ZSLYEC-010).

AUTHOR CONTRIBUTIONS
XP, FW, QZ, and YL jointly wrote the manuscript. XP, RH, and XY collected all the input sources data. XP and FW annotated the images data. FW, QZ, and YL designed the model and implemented the main algorithm and other computational analysis. XF performed a review of the manuscript on the clinical aspects. All authors contributed to the article and approved the submitted version.