Deep Convolutional Neural Network-Based Lymph Node Metastasis Prediction for Colon Cancer Using Histopathological Images

Background Human evaluation of pathological slides cannot accurately predict lymph node metastasis (LNM), although accurate prediction is essential to determine treatment and follow-up strategies for colon cancer. We aimed to develop accurate histopathological features for LNM in colon cancer. Methods We developed a deep convolutional neural network model to distinguish the cancer tissue component of colon cancer using data from the tissue bank of the National Center for Tumor Diseases and the pathology archive at the University Medical Center Mannheim, Germany. This model was applied to whole-slide pathological images of colon cancer patients from The Cancer Genome Atlas (TCGA). The predictive value of the peri-tumoral stroma (PTS) score for LNM was assessed. Results A total of 164 patients with stages I, II, and III colon cancer from TCGA were analyzed. The mean PTS score was 0.380 (± SD = 0.285), and significantly higher PTS scores were observed in patients in the LNM-positive group than those in the LNM-negative group (P < 0.001). In the univariate analyses, the PTS scores for the LNM-positive group were significantly higher than those for the LNM-negative group (P < 0.001). Further, the PTS scores in lymphatic invasion and any one of perineural, lymphatic, or venous invasion were significantly increased in the LNM-positive group (P < 0.001 and P < 0.001). Conclusion We established the PTS score, a simplified reproducible parameter, for predicting LNM in colon cancer using computer-based analysis that could be used to guide treatment decisions. These findings warrant further confirmation through large-scale prospective clinical trials.


INTRODUCTION
Colon cancer is a major cause of morbidity and mortality worldwide, and its occurrence is expected to increase significantly over the next few years (1,2). In recent years, the number of dysplastic and colon cancer cases has increased, resulting in increased awareness and the introduction of screening and surveillance programs for colon cancer (3,4). The presence of lymph node metastasis (LNM) is a crucial prognostic factor to determine whether patients with early-stage colon cancer should undergo additional surgery after local endoscopic treatment and whether adjuvant chemotherapy is necessary after surgical resection for those in the advanced stages (5)(6)(7).
Currently, clinicians make important treatment decisions through nodal status evaluation based only on limited radiological examinations, such as ultrasound and computed tomography, and on manual evaluations of a few histological features via light microscopy. However, qualitative evaluation of pathological features exclusively (such as histologic type, depth of tumor invasion, and tumor grades) is insufficient for predicting the presence of LNM in patients with colon cancer; inconsistent determinations among experienced pathologists have been noted even with the best-characterized histopathological features. Furthermore, micro-metastasis (8,9), the presence of minimal cancer cells in regional lymph nodes that pathological examination cannot detect, is observed through immunohistochemistry and molecular genetic evaluation in up to 50% of patients with nodenegative colon cancer even after radical surgery, aside from local endoscopic treatment being unable to provide an accurate status of regional lymph nodes (10)(11)(12).
Recently, computer-aided image analysis in histopathology evaluations has been shown to offer efficient, accurate, and consistent quantitative feature extraction and also provides decision-making support to ensure diagnostic consistency (13,14).
Therefore, the purpose of this study is to identify a pathological parameter for reliable and accurate assessment of LNM using a deep convolution neural network (CNN) model that can better stratify patients with colon cancer.

Histopathology Image Resource for Predictive Parameter
All available digital whole-slide stained high-resolution histopathological images of colorectal cancer (CRC) were obtained from the Genomic Data Commons Data Portal of the National Cancer Institute (https://portal.gdc.cancer.gov/). Pathology slides for the presence and extent of tumors and various tissue components were manually reviewed by a boardcertified pathologist (K.Y.W.). The number of pathology image slides publicly available from The Cancer Genome Atlas (TCGA) cohort varied, ranging from one to eight slides (a majority of patients had only one representative slide of the tumor specimen). At model inference and evaluation, only one representative slide for each patient was used in the analysis.
Patients with rectal cancer were excluded because rectal cancer differs from colon cancer in its outcome patterns, such as local relapse or metastasis in the disease course after curative surgery (15). Slides with tissue folds, torn tissues, inadequately stained tissues, or other artifacts as well as slides without any tumor tissue were excluded. Based on the American Joint Committee on Cancer (AJCC) staging system, the patients with colon cancer were divided into LNM-positive (stage III) and LNM-negative (stage I and stage II) groups, according to the pathological presence or absence of LNM (16), respectively. Extramural tumor deposits (EMTDs), including lymphatic invasion (LI), venous invasion (VI), perineural invasion (PI), and any of the aforementioned features (AnyI), were also evaluated (17). This study was reviewed and approved by the Institutional Review Board of the Kyung Hee University Hospital at Gangdong (KHNMC IRB 2020-09-025). The need for informed consent was waived because all data used in this study were de-identified.

Training and Testing of Neural Networks
For the training of our CNN model, we used a training image set comprising 100,000 image patches (224 × 224 pixels and 0.5 mm/ pixel), with an approximately equal number of images for the following seven tissue classes: normal colon mucosa, stroma, lymphocytes, mucus, adipose tissue, smooth muscle, and colon cancer epithelium; this image set is publicly available at http://dx. doi.org/10.5281/zenodo.1214456. We conducted the image segmentation using the multi-threshold technique (18). Our network architecture for the auto-segmentation of tumormicroenvironment-related features in colon cancer histology is based on the U-Net architecture (19) because this architecture was initially proposed to improve the performance of fine segmentation and localization, particularly for biomedical images. For images with heights and widths less than the target size, we padded all image patches with "reflect padding" to obtain a size of 512 × 512; the padded voxels were acquired by mirroring the existing images. All images were normalized using the Macenko method (20) and were preprocessed prior to thresholding via histogram normalization, in order to standardize the intensities of each RGB channel in the range of 0 to 255 (21). The threshold values were empirically selected, and visual validation was conducted by an experienced pathologist. Among the generated image patches, 80% were used to train our model with forward and backward propagation. To build highperformance network architectures, we divided the remaining data into a 10% validation set and 10% testing set. We used Adam to minimize the cross-entropy loss during stochastic optimization and the adaptive momentum algorithm for smooth convergence (22). Training was terminated when the mean Dice similarity coefficient (DSC) for the validation dataset did not increase by at least 0.1% after 10 additional epochs from its epoch with the best performance. The best model was generated in the epoch with the highest mean DSC. Training was implemented with the Keras library on a parallel computing architecture, using an Intel Core i9-7960 CPU (2.8 GHz) and a two-GPU-enabled Nvidia GeForce RTX Titan graphics card (24 GB of memory).

Histological Microenvironmental Feature Extraction
Morphological image processing (MIP) involves a collection of non-linear operations related to the shape or morphology of features in an image (23). The images may contain numerous imperfections. In particular, the binary regions produced by simple thresholding are distorted by noise and texture. To identify the features of interest in whole-slide images after removing inevitable imperfections, feature extraction of the histological microenvironment was performed based on MIP, which is generated by optimizing the structuring element (SE) over the image in an activity similar to convolution (23). At each pixel position, an individual operation was applied between the corresponding SEs and the matrix data of each pathology image. The successive operations of morphological erosion and dilation of MIP were performed based on the nature of the SE (24). The PTS area was calculated as the sum of pixels of stroma tissue within the tumor region boundaries derived from MIP. Then, it was adjusted by the tumor area computed using the total number of annotated pixels originating from cancer. The PTS score is

Statistical Analysis
All analyses were performed with R statistical software (version 4.0.0) and Python (version 3.6.9). Demographic differences between the two groups were tested using the Student's t-test and Pearson chi-square test. To assess the performance of the proposed parameter, we obtained the area under the curve of the receiver operating characteristic (ROC), a distribution of the performance metric. We used an unadjusted logistic regression method and calculated odds ratios (ORs) and their 95% confidence intervals (CIs) to assess statistical associations between independent variables and outcomes. Two-sided P ≤ 0.05 was considered to be statistically significant.

Image Processing
A total of 591 patients (600 slides) with CRC in TCGA data were collected, of which about 153 patients (154 slides) were excluded from the study because they were diagnosed with rectal cancer. After excluding 210 patients (217 slides) with inadequate image data, such as poor image quality, bad H&E staining (i.e., too weak or too strong), duplicated images, and artifacts, and 64 patients (65 slides) with distant metastases, 164 patients (164 slides), with a diagnosis of colon cancer in stages I, II, and III based on the seventh edition of the AJCC, were analyzed (25). The image data processing workflow is shown in Figure 1.

Patient Characteristics
A total of 98 patients (59.8%) were in the LNM-negative group, while the LNM-positive group accounted for 40.2% ( Table 1). The patients without LNM {mean age 66.8 years [standard deviation (SD), 13.1]} were older than those with LNM [mean  Table 1). Among the patients with LNM, 54.5% were female, and 45.5% were male ( Table 1). A total of 66 patients with LNM had T3 (80.3%) and T4 (15.2%) primary tumors, while T1 and T2 (4.5%) tumors were less frequent (P < 0.001). According to the data, LI and VI were more likely to occur in the patients with LNM, and 42 cases (63.6%) and 25 cases (37.9%), respectively, were observed in our sample (all P < 0.001, Table 1). The details of the patients' clinicopathological summaries are shown in Table 1.

Evaluation of Histological Image Segmentation Using CNN
In Supplementary Figure 2, we present the curves of training accuracy, loss, and test accuracy over epochs. The training and test accuracy curves converge on approaching 56 epochs, where training met our criterion for termination. The proposed model achieved high segmentation performance, scoring a test mean DSC of 0.892. We observed balanced class performance for all the neural networks we tested (including other architectures), with DSC values of 0.938, 0.968, 0.841, 0.732, 0.928, 0.815, and 0.930 for adipose tissue, lymphocytes, mucus, smooth muscle, normal colon mucosa, stroma, and colon cancer epithelium, respectively ( Table 2). A representative image of the corresponding segmented classes from the designed model is shown in Figure 2. Figure 3 shows the PTS and tumor segmented through MIP. The mean score for the PTS was 0.380 (SD 0.285), and significantly higher PTS scores were observed in the LNM-positive group than in the LNM-negative group (P < 0.001) ( Table 3). In the univariate analyses, the PTS scores for patients of the LNMpositive group were significantly higher than the scores of those of the LNM-negative group (OR 26.654; CIs 5.677-196.987, P < 0.001) ( Table 3). The PTS score had a moderate ability to identify the presence of LNM in colon cancer (AUC 0.677; CIs 0.593-0.760) (Supplementary Figure 3).

PTS Score for Prognostic Factor
Compared with the patients who did not have EMTDs, the PTS scores in LI and AnyI were significantly higher in the patients with colon cancer (P < 0.001 for both), while no significant association was identified between the PTS score and LI, VI, or AnyI ( Table 3 and Supplementary Figure 4).  Table 4).

DISCUSSION
In this study, we established a novel stromal microenvironment parameter-PTS score-to predict LNM in patients with colon cancer. To the best of our knowledge, this is the first study to develop a predictive histopathological parameter for LNM in patients with colon cancer by using artificial intelligence.  Considering the high interobserver variability in traditional pathological assessment (26)(27)(28)(29), more accurate and reproducible histopathological assessments can reduce the inaccuracies associated with relying on subjective individual markers and better define the optimal treatment strategy for colon cancer. Whole-slide images (WSIs) contain a vast amount of information regarding cancer patients; however, it can be difficult to assess all features through manual evaluation of histology tissues, because it is significantly time consuming and can lead to substantial intraand inter-observation variations among pathologists (30)(31)(32).
To date, a few parameters for assessing the prognosis of patients with colon cancer have been developed using computer-aided CNN methods from pathologic images (18,(33)(34)(35). In a recent German study, Kather et al. developed a parameter, the so-called "deep stroma score," to predict the prognosis of overall survival directly from histopathological images in CRC patients (18). However, it comprised information from not only stroma (cancer-related or not cancer-related stroma) but also various other components such as debris and adipose tissue, regardless of their distance from the tumor (18). The tumor microenvironment is a heterogeneous population of cells composed of tumor cells and tumor-associated stroma, which promote tumor growth, development, and propagation, surrounding non-cancer or stromal cells recruited by the tumor (36)(37)(38)(39). Therefore, an analysis focused on the stromal components near a tumor could appropriately assess the contribution of cancer-related stroma in cancer tissue.
Another study by Takamatsu et al. presented a deep-learning model for predicting LNM from pathology images with cytokeratin immunohistochemistry in early CRC (33). However, only a limited number of LNM events exist, which limited the data available for the study, caused by the low rate of metastasis in early CRC. Furthermore, stepwise procedures, such as cytokeratin immunohistochemical staining and calculation of several parameters followed by model selection, were necessary for the prediction. Thus, it is inconvenient and not applicable for unusual cases that lack typical histological features in a slide image. A recent Chinese study also presented several morphologic parameters from pathology data to predict recurrent risk in stage III CRC (34). The authors generated new parameters by combining different histological components from whole tissue slides. Although this might be an interesting attempt, the validity of the parameters is not guaranteed considering the variability of tissue components contained in histological images through the pathology preparation. Lastly, Bychkov et al. stratified CRC patients for disease-specific survival into low-and high-risk groups using a CNN method on pathology images (35). However, they did not present any specific histological prognostic parameter, which could have potential utility in clinical decision-making.
Several previous studies have revealed prognostic information regarding the tumor-stroma ratio in CRC (40)(41)(42)(43)(44)(45)(46). Despite the evidence, it has not been implemented in routine pathology reporting because of significant variations in methodology and the lack of a standardized procedure for assessing tumor-stroma ratios. Published studies propose manual assessments of the deepest point of tumor invasion (40-42, 45, 46), systematic random point assessment (43), and the use of a semiautomatic method combining human input and a deeplearning algorithm using WSIs (44). However, time-and laborintensive manual evaluations by pathologists with expertise must take precedence in these methods.
To overcome the issues discussed above, herein we applied a scoring procedure in which the relative amounts of tumor and the PTS score, as a straightforward measure, were calculated and adjusted based on the entire tumor area in a WSI using a CNN. Even without deep domain knowledge and the experience to assess pathologic images, this allows for obtaining easy and reproducible quantification of PTS and has the potential to pave the way for the implementation of the PTS score in clinical practice.
Our findings also indicated that the PTS score may be an independent parameter for predicting the presence of EMTDs in colon cancer. Histopathological identification of LI, VI, and PI in cancer tissue has long been recognized as a potential prognostic indicator for patient outcomes because of the likely association with progression to lymphatic metastasis (29,47,48). It is necessary to identify the parameters that can reduce interobserver variability because the rate of LI, VI, and PI detection is directly related to both technical aspects of tissue preparation, such as staining technique and the number of blocks examined, and the pathologist's experience and specialization (49,50).
The present study demonstrated a significant association between the PTS score and LI and AnyI, whereas no  prognostic significance for VI and PI were found. Taken together, the PTS score could be a useful tool to identify patients who are at risk of developing LNM and EMTDs in colon cancer. Despite these advantages, the current study has a few limitations. One limitation of this study is that cases submitted for the TCGA database might be biased in terms of mainly including images in which the morphological patterns of disease are definitive, which could be different from what pathologists encounter in their daily practice. In addition, we could not alleviate the heterogeneity in stain color, despite the well-established stain normalization method. Therefore, it will be necessary to standardize a pathology stain method, which will decrease the difficulty of producing consistent diagnostic results and help build systems that generalize well. Another limitation is that, despite a good potential prognostic value of the PTS score for LNM and EMTDs overall and in T3-4 colon cancer patients, a prognostic value for cases with early T-stages was not observed. This may result from a class imbalance problem due to the limited sample size and the low event rate of EMTDs. Therefore, further studies should be performed using larger samples to obtain more accurate results for early T-stage colon cancer.
In conclusion, we established that the PTS score is, potentially, a promising and easy-to-apply prognostic parameter for LNM in colon cancer. However, because of the limitations inherent in studies based on observational data, these findings should be confirmed through subsequent prospective clinical trials.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

ETHICS STATEMENT
This study was reviewed and approved by the Institutional Review Board of the Kyung Hee University Hospital at Gangdong (KHNMC IRB 2020-09-025). The need for informed consent was waived because all data used in this study were de-identified.

AUTHOR CONTRIBUTIONS
MK designed the study. JY and MK analyzed and interpreted the data and wrote the manuscript. JC, HK, JJ, and JY supervised the project and revised the paper. All authors contributed to the article and approved the submitted version.