<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="2.3">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Oncol.</journal-id>
<journal-title>Frontiers in Oncology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Oncol.</abbrev-journal-title>
<issn pub-type="epub">2234-943X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fonc.2021.614398</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Oncology</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Development of a Novel Prognostic Model for Predicting Lymph Node Metastasis in Early Colorectal Cancer: Analysis Based on the Surveillance, Epidemiology, and End Results Database</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Ahn</surname>
<given-names>Ji Hyun</given-names>
</name>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Kwak</surname>
<given-names>Min Seob</given-names>
</name>
<xref ref-type="author-notes" rid="fn001">
<sup>*</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/933624"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Lee</surname>
<given-names>Hun Hee</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/1185429"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Cha</surname>
<given-names>Jae Myung</given-names>
</name>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Shin</surname>
<given-names>Hyun Phil</given-names>
</name>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Jeon</surname>
<given-names>Jung Won</given-names>
</name>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Yoon</surname>
<given-names>Jin Young</given-names>
</name>
</contrib>
</contrib-group>
<aff id="aff1">
<institution>Department of Internal Medicine, Kyung Hee University Hospital at Gangdong, Kyung Hee University College of Medicine</institution>, <addr-line>Seoul</addr-line>, <country>South Korea</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>Edited by: Jaw-Yuan Wang, Kaohsiung Medical University Hospital, Taiwan</p>
</fn>
<fn fn-type="edited-by">
<p>Reviewed by: Zhangya Pu, Central South University, China; Katsuro Ichimasa, Showa University Yokohama Northern Hospital, Japan</p>
</fn>
<fn fn-type="corresp" id="fn001">
<p>*Correspondence: Min Seob Kwak, <email xlink:href="mailto:kwac63@khu.ac.kr">kwac63@khu.ac.kr</email></p>
</fn>
<fn fn-type="other" id="fn002">
<p>This article was submitted to Gastrointestinal Cancers, a section of the journal Frontiers in Oncology</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>25</day>
<month>03</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>11</volume>
<elocation-id>614398</elocation-id>
<history>
<date date-type="received">
<day>06</day>
<month>10</month>
<year>2020</year>
</date>
<date date-type="accepted">
<day>11</day>
<month>03</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2021 Ahn, Kwak, Lee, Cha, Shin, Jeon and Yoon</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Ahn, Kwak, Lee, Cha, Shin, Jeon and Yoon</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<sec>
<title>Background</title>
<p>Identification of a simplified prediction model for lymph node metastasis (LNM) for patients with early colorectal cancer (CRC) is urgently needed to determine treatment and follow-up strategies. Therefore, in this study, we aimed to develop an accurate predictive model for LNM in early CRC.</p>
</sec>
<sec>
<title>Methods</title>
<p>We analyzed data from the 2004-2016 Surveillance Epidemiology and End Results database to develop and validate prediction models for LNM. Seven models, namely, logistic regression, XGBoost, k-nearest neighbors, classification and regression trees model, support vector machines, neural network, and random forest (RF) models, were used.</p>
</sec>
<sec>
<title>Results</title>
<p>A total of 26,733 patients with a diagnosis of early CRC (T1) were analyzed. The models included 8 independent prognostic variables; age at diagnosis, sex, race, primary site, histologic type, tumor grade, and, tumor size. LNM was significantly more frequent in patients with larger tumors, women, younger patients, and patients with more poorly differentiated tumor. The RF model showed the best predictive performance in comparison to the other method, achieving an accuracy of 96.0%, a sensitivity of 99.7%, a specificity of 92.9%, and an area under the curve of 0.991. Tumor size is the most important features in predicting LNM in early CRC.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>We established a simplified reproducible predictive model for LNM in early CRC that could be used to guide treatment decisions. These findings warrant further confirmation in large prospective clinical trials.</p>
</sec>
</abstract>
<kwd-group>
<kwd>machine learning</kwd>
<kwd>colorectal cancer</kwd>
<kwd>prediction</kwd>
<kwd>metastasis</kwd>
<kwd>model</kwd>
</kwd-group>
<contract-sponsor id="cn001">National Research Foundation of Korea<named-content content-type="fundref-id">10.13039/501100003725</named-content>
</contract-sponsor>
<counts>
<fig-count count="3"/>
<table-count count="3"/>
<equation-count count="0"/>
<ref-count count="39"/>
<page-count count="9"/>
<word-count count="3880"/>
</counts>
</article-meta>
</front>
<body>
<sec id="s1" sec-type="intro">
<title>Introduction</title>
<p>Colorectal cancer (CRC) is a major cause of morbidity and mortality worldwide, its importance is expected to continue increasing over time (<xref ref-type="bibr" rid="B1">1</xref>, <xref ref-type="bibr" rid="B2">2</xref>). In recent years, increased awareness and the introduction of population-based surveillance and screening programs have led to achieving higher rates of precancerous dysplastic lesions or early CRC detection (<xref ref-type="bibr" rid="B3">3</xref>, <xref ref-type="bibr" rid="B4">4</xref>).</p>
<p>Early CRC is a tumor that is confined to the mucosa and/or submucosa regardless of the presence of regional lymph node metastasis (LNM). In certain cases of early CRC, endoscopic resection is a less invasive and cost-effective treatment compared to surgery (<xref ref-type="bibr" rid="B5">5</xref>&#x2013;<xref ref-type="bibr" rid="B7">7</xref>). However, the CRC patients with LNM or distant metastasis cannot be adequately cured by local endoscopic treatment alone, and therefore subsequently require surgical resection for achieving a curative treatment.</p>
<p>LNM is found in approximately 6&#x2013;16% of the patients with submucosal invasive CRC (<xref ref-type="bibr" rid="B8">8</xref>&#x2013;<xref ref-type="bibr" rid="B10">10</xref>), however, the number might be underestimated, as clinicians make important treatment decisions based on limited examinations, such as computed tomography (CT) and ultrasonography.</p>
<p>Thus, an accurate and fast assessment of locoregional and/or distant metastases in patients with early CRC is essential to determine whether these patients should undergo additional surgical resections or be needed surveillance regularly. Currently, no universally accepted indications and criteria exist for additional surgical resection after endoscopic resection, even though a fast and accurate assessment of the risk of locoregional LNM after local endoscopic treatment of patients with early CRC is necessary.</p>
<p>Therefore, the aim of present study was to develop a novel prediction model for LNM by using simple histopathological and clinical parameters with high reliability, that can be used to improve patient risk stratification in early CRC.</p>
</sec>
<sec id="s2" sec-type="materials|methods">
<title>Materials and Methods</title>
<sec id="s2_1">
<title>Data Source</title>
<p>This study used the Surveillance, Epidemiology, and End Results (SEER) Program database from the National Cancer Institute, which is publicly available U.S. cancer registries. The registry collects and publishes cancer incidence, mortality, and survival data from 17 population-based cancer registries, covering approximately 34.6% of the U.S. population (Iowa, Los Angeles, Connecticut, Utah, Greater California, Idaho, Georgia Center for Cancer Statistics, San Francisco-Oakland, San Jose-Monterey, Louisiana, Hawaii, Massachusetts, Alaska Native tumor registry, Kentucky, New Mexico, New York, Seattle-Puget Sound) (<xref ref-type="bibr" rid="B11">11</xref>). The database is roughly represent the U.S. population and includes information on over 9 million cancer cases with over 550,000 new cases added to the database annually. It offers a powerful resource for researchers focused on understanding the natural history of CRC and improving quality healthcare for the patients (<xref ref-type="bibr" rid="B11">11</xref>, <xref ref-type="bibr" rid="B12">12</xref>). This retrospective cohort study was evaluated and approved by the Institutional Review Board of the Kyung Hee University Hospital at Gangdong (KHNMC IRB 2020-01-015).</p>
</sec>
<sec id="s2_2">
<title>Study Population</title>
<p>The SEER registry collects data including age at diagnosis, sex, race, primary site, histologic type, tumor grade, tumor size, and tumor depth. Using the SEER 1975&#x2013;2016 database (released 4/15/2019), we analyzed data from all patients diagnosed with T1 CRC for the years 2004-2016. T1 CRC was defined as infiltration of the tumor into the submucosa. We extracted clinical demographic data, including age at diagnosis, sex, race and tumor information including location, size, grade, histologic type, and American Joint Committee on Cancer 7th TNM stages by using SEER disease codes. Tumor location was determined by using the following codes: C18.0 (cecum); C18.1 (appendix); C18.2 (ascending colon); C18.3 (hepatic flexure); C18.4 (transverse colon); C18.5 (splenic flexure); C18.6 (descending colon); C18.7 (sigmoid colon); C18.8 (overlapping lesion of colon); C18.9 (colon); rectosigmoid (C19.9); and rectum (C20.9). The morphology of cancer was categorized according to the ICD-0-3 histology and behavior codes: 8010/3, (carcinoma, NOS); 8020/3, (carcinoma, undifferentiated, NOS); 8140/3, (adenocarcinoma, NOS); 8144/3, (adenocarcinoma, intestinal type); 8210/3, (adenocarcinoma in adenomatous polyp); 8211/3, (tubular adenocarcinoma); 8255/3, (adenocarcinoma with mixed subtypes); 8261/3, (adenocarcinoma in villous adenoma); 8262/3, (villous adenocarcinoma); 8263/3, (adenocarcinoma in tubulovillous adenoma); 8440/3, (cystadenocarcinoma, NOS); 8470/3, (mucinous cystadenocarcinoma, NOS); 8480/3, (mucinous adenocarcinoma); 8481/3, (mucin-producing adenocarcinoma); 8490/3, (signet ring cell carcinoma); and8221/3, (adenocarcinoma in multiple adenomatous polyps). For tumor differentiation grading, we used a four tier classification including well differentiated, moderately differentiated, poorly differentiated, undifferentiated, which is proposed by WHO grading system (<xref ref-type="bibr" rid="B13">13</xref>). In order to exclude potentially confounding factor, the patients who received preoperative radiation treatment were excluded. The overall scheme of the workflow is illustrated in <xref ref-type="fig" rid="f1">
<bold>Figure 1</bold>
</xref>.</p>
<fig id="f1" position="float">
<label>Figure 1</label>
<caption>
<p>The Workflow of the development process.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fonc-11-614398-g001.tif"/>
</fig>
</sec>
<sec id="s2_3">
<title>Establishment of the Predictive Model</title>
<p>In this study, we used seven machine-learning (ML) models that are commonly used to predict LNM in patients with early CRC. For the linear model, the logistic regression model (LR) was selected (<xref ref-type="bibr" rid="B14">14</xref>). The neural network model (NN), which is one of the important classes of nonlinear prediction models and has been reported in a previous study was used (<xref ref-type="bibr" rid="B15">15</xref>). For the kernel-based model, we applied the support vector machine (SVM), which is adopted in many clinical applications (<xref ref-type="bibr" rid="B16">16</xref>). For the decision tree approach, the classification and regression trees model (CART), XGBoost (XGB) model and random forest (RF) model, which have also been used in clinical research were included (<xref ref-type="bibr" rid="B17">17</xref>&#x2013;<xref ref-type="bibr" rid="B19">19</xref>). Finally, for the basic prediction technique, k-nearest neighbor algorithm (kNN) was selected (<xref ref-type="bibr" rid="B20">20</xref>).</p>
<p>We used random oversampling method to improve the classifier performance for the minority classes in our imbalanced classes (<xref ref-type="bibr" rid="B21">21</xref>). First, the patients were randomly assigned to a training set (90%) and a test set (10%), where the two class (LNM group vs. non-LNM group) proportions in each set were the same. In the training set, we performed k-fold cross-validation (k&#x2009;=&#x2009;10), and grid search was used to find the best parameter combinations. For each set of parameters, we fitted the model in turn with 9/10 of data and used 1/10 of data for validation.</p>
</sec>
<sec id="s2_4">
<title>Assessment of Prediction Models</title>
<p>To ensure a fair comparison of the models, we used the confusion matrix, area under the curve (AUC), sensitivity (recall), specificity, accuracy, average precision (AP), false positive rate, and precision as performance indicators. We used the AU-ROC as the performance index and the AP value as the criterion for the precision-recall (PR) curve (<xref ref-type="bibr" rid="B22">22</xref>). The average value of the parameter was finally executed on the test set.</p>
</sec>
<sec id="s2_5">
<title>Statistical Analysis</title>
<p>All data were obtained using the SEER*Stat software (8.3.6 version; Surveillance Research Program, National Cancer Institute). All analyses were performed with Python (version 3.6.9) and R statistical software (version 3.6.0). Demographic differences between the two groups were tested using the Student&#x2019;s t-test and Pearson chi&#x2010;square test. To better evaluate the performance of the models, we used a paired t test to compare the AU-ROC further in each resampling calculation. A two&#x2010;sided <italic>P &#x2264; 0.05</italic> was considered statistically significant.</p>
</sec>
</sec>
<sec id="s3" sec-type="results">
<title>Results</title>
<sec id="s3_1">
<title>Baseline Characteristics</title>
<p>A total of 347,956 patients with CRC between 2004 and 2016 were collected, of which about 292,201 patients were excluded from the study because they were diagnosed with T0 or advanced CRC with or without distant metastasis. After excluding 28,197 patients with insufficient data and 825 patients treated with preoperative radiation therapy, 26,733 patients with a diagnosis of early CRC (T1) were analyzed. The model included eight independent prognostic variables, including age at diagnosis, sex, race, primary site, histologic type, tumor grade, and, tumor size. The analyzed patients were divided into the LNM (2,543 patients, 9.5%) and non-LNM groups (24,190 patients, 90.5%). The younger people (&lt; 60 years) tended to have more LNM at diagnosis compared with the older group (<italic>P</italic> &lt; 0.001). Significantly higher LNM in women compared with men was observed in the patients with early CRC (<italic>P</italic> &lt; 0.001). The proportion of LNM in the distal colon included the descending colon, sigmoid colon, and the rectosigmoid junction, was significantly higher than that in the colon proximal to the splenic flexure (<italic>P</italic> &lt; 0.001). The overall racial and/or ethnic distribution was 69.7% non-Hispanic whites, 11.9% non-Hispanic blacks, 9.0% Hispanics, 8.9% non-Hispanic Asians or Pacific Islander, and 0.5% others (non-Hispanic American, Indian, Alaska natives). Among all patients evaluated, 20.8% (n=5,572) had well differentiated tumor; 71.2% (n=19,026), moderately differentiated; 7.1% (n=1,902), poorly differentiated; 0.9% (n=233), undifferentiated cancer. The mean tumor size was significantly larger in the early CRC patients with LNM than in those of without LNM (22.8mm vs. 20.6&#xa0;mm) (<italic>P</italic> &lt; 0.001). <xref ref-type="table" rid="T1">
<bold>Table&#xa0;1</bold>
</xref> shows the overall distribution of baseline characteristics of the study population.</p>
<table-wrap id="T1" position="float">
<label>Table 1</label>
<caption>
<p>Baseline characteristics.</p>
</caption>
<table frame="hsides">
<thead>
<tr>
<th valign="top" rowspan="2" align="left">Variables</th>
<th valign="top" align="center">LNM (-)</th>
<th valign="top" colspan="2" align="center">LNM (+)</th>
</tr>
<tr>
<th valign="top" align="center">N = 24190</th>
<th valign="top" align="center">N = 2543</th>
<th valign="top" align="center">
<italic>P</italic>-value</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Age at diagnosis, n (%)</td>
<td valign="top" align="center">
</td>
<td valign="top" align="center">
</td>
<td valign="top" align="left">&lt;0.001</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;0-9</td>
<td valign="top" align="center">0 (0.0)</td>
<td valign="top" align="center">1 (0.0)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;10-19</td>
<td valign="top" align="center">5 (0.0)</td>
<td valign="top" align="center">1 (0.0)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;20-29</td>
<td valign="top" align="center">73 (0.3)</td>
<td valign="top" align="center">10 (0.4)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;30-39</td>
<td valign="top" align="center">339 (1.4)</td>
<td valign="top" align="center">61 (2.4)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;40-49</td>
<td valign="top" align="center">1511 (6.3)</td>
<td valign="top" align="center">241 (9.5)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;50-59</td>
<td valign="top" align="center">5684 (23.5)</td>
<td valign="top" align="center">730 (28.7)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;60-69</td>
<td valign="top" align="center">6775 (28.0)</td>
<td valign="top" align="center">683 (26.9)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;70-79</td>
<td valign="top" align="center">5952 (24.6)</td>
<td valign="top" align="center">544 (21.4)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;80-89</td>
<td valign="top" align="center">3410 (14.1)</td>
<td valign="top" align="center">245 (9.6)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;90-99</td>
<td valign="top" align="center">441 (1.8)</td>
<td valign="top" align="center">27 (1.1)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">Sex, n (%)</td>
<td valign="top" align="center">
</td>
<td valign="top" align="center">
</td>
<td valign="top" align="center">&lt;0.001</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;M</td>
<td valign="top" align="center">12864 (53.2)</td>
<td valign="top" align="center">1254 (49.3)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;F</td>
<td valign="top" align="center">11326 (46.8)</td>
<td valign="top" align="center">1289 (50.7)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">Primary site, n (%)</td>
<td valign="top" align="center">
</td>
<td valign="top" align="center">
</td>
<td valign="top" align="center">&lt;0.001</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Cecum</td>
<td valign="top" align="center">3355 (13.9)</td>
<td valign="top" align="center">381 (15.0)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Appendix</td>
<td valign="top" align="center">119 (0.5)</td>
<td valign="top" align="center">4 (0.2)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Ascending colon</td>
<td valign="top" align="center">3493 (14.4)</td>
<td valign="top" align="center">300 (11.8)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Hepatic flexure of colon</td>
<td valign="top" align="center">665 (2.7)</td>
<td valign="top" align="center">61 (2.4)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Transverse colon</td>
<td valign="top" align="center">1545 (6.4)</td>
<td valign="top" align="center">119 (4.7)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Splenic flexure of colon</td>
<td valign="top" align="center">381 (1.6)</td>
<td valign="top" align="center">39 (1.5)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Descending colon</td>
<td valign="top" align="center">1009 (4.2)</td>
<td valign="top" align="center">97 (3.8)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Sigmoid colon</td>
<td valign="top" align="center">6193 (25.6)</td>
<td valign="top" align="center">773 (30.4)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Overlapping lesion of colon</td>
<td valign="top" align="center">78 (0.3)</td>
<td valign="top" align="center">5 (0.2)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Colon, NOS</td>
<td valign="top" align="center">111 (0.5)</td>
<td valign="top" align="center">7 (0.3)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Rectosigmoid junction</td>
<td valign="top" align="center">1737 (7.2)</td>
<td valign="top" align="center">268 (10.5)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Rectum, NOS</td>
<td valign="top" align="center">5504 (22.7)</td>
<td valign="top" align="center">489 (19.2)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">Tumor grade, n (%)</td>
<td valign="top" align="center">
</td>
<td valign="top" align="center">
</td>
<td valign="top" align="center">&lt;0.001</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Well differentiated</td>
<td valign="top" align="center">5284 (21.8)</td>
<td valign="top" align="center">288 (11.3)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Moderately differentiated</td>
<td valign="top" align="center">17173 (71.0)</td>
<td valign="top" align="center">1853 (72.9)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Poorly differentiated</td>
<td valign="top" align="center">1538 (6.4)</td>
<td valign="top" align="center">364 (14.3)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Undifferentiated</td>
<td valign="top" align="center">195 (0.8)</td>
<td valign="top" align="center">38 (1.5)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">Race, n (%)</td>
<td valign="top" align="center">
</td>
<td valign="top" align="center">
</td>
<td valign="top" align="center">&lt;0.001</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Hispanic</td>
<td valign="top" align="center">2186 (9.1)</td>
<td valign="top" align="center">228 (9.0)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Non-Hispanic American Indian/Alaska Native</td>
<td valign="top" align="center">129 (0.5)</td>
<td valign="top" align="center">10 (0.4)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Non-Hispanic Asian or Pacific Islander</td>
<td valign="top" align="center">2099 (8.7)</td>
<td valign="top" align="center">270 (10.6)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Non-Hispanic Black</td>
<td valign="top" align="center">2837 (11.7)</td>
<td valign="top" align="center">354 (13.9)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Non-Hispanic White</td>
<td valign="top" align="center">16939 (70.0)</td>
<td valign="top" align="center">1681 (66.1)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">Tumor type, n (%)</td>
<td valign="top" align="center">
</td>
<td valign="top" align="center">
</td>
<td valign="top" align="center">&lt;0.001</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Carcinoma, NOS</td>
<td valign="top" align="center">40 (0.2)</td>
<td valign="top" align="center">6 (0.2)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Carcinoma, undifferentiated, NOS</td>
<td valign="top" align="center">1 (0.0)</td>
<td valign="top" align="center">1 (0.0)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Adenocarcinoma, NOS</td>
<td valign="top" align="center">9657 (39.9)</td>
<td valign="top" align="center">1148 (45.1)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Adenocarcinoma, intestinal type</td>
<td valign="top" align="center">2 (0.0)</td>
<td valign="top" align="center">2 (0.1)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Adenocarcinoma in adenomatous polyp</td>
<td valign="top" align="center">5943 (24.6)</td>
<td valign="top" align="center">513 (20.2)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Tubular adenocarcinoma</td>
<td valign="top" align="center">47 (0.2)</td>
<td valign="top" align="center">2 (0.1)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Adenocarcinoma with mixed subtypes</td>
<td valign="top" align="center">20 (0.1)</td>
<td valign="top" align="center">4 (0.2)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Adenocarcinoma in villous adenoma</td>
<td valign="top" align="center">1378 (5.7)</td>
<td valign="top" align="center">130 (5.1)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Villous adenocarcinoma</td>
<td valign="top" align="center">27 (0.1)</td>
<td valign="top" align="center">1 (0.0)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Adenocarcinoma in tubulovillous adenoma</td>
<td valign="top" align="center">6420 (26.5)</td>
<td valign="top" align="center">614 (24.2)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Cystadenocarcinoma, NOS</td>
<td valign="top" align="center">1 (0.0)</td>
<td valign="top" align="center">0 (0.0)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Mucinous cystadenocarcinoma, NOS</td>
<td valign="top" align="center">11 (0.1)</td>
<td valign="top" align="center">0 (0.0)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Mucinous adenocarcinoma</td>
<td valign="top" align="center">487 (2.0)</td>
<td valign="top" align="center">82 (3.2)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Mucin-producing adenocarcinoma</td>
<td valign="top" align="center">107 (0.4)</td>
<td valign="top" align="center">20 (0.8)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">&#x2003;Signet ring cell carcinoma</td>
<td valign="top" align="center">49 (0.2)</td>
<td valign="top" align="center">20 (0.8)</td>
<td valign="top" align="center">
</td>
</tr>
<tr>
<td valign="top" align="left">Tumor size, mm, mean (SD)</td>
<td valign="top" align="center">20.6 (25.4)</td>
<td valign="top" align="center">22.8 (20.9)</td>
<td valign="top" align="center">&lt;0.001</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>SD, standard deviation.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="s3_2">
<title>Tuning of Parameters</title>
<p>We trained the SVM a combination of a C value of 1.0 and a kernel smoothing parameter &#x3c3; of 0.001. For kNN, a relatively large number of k&#x2009;=&#x2009;14 was optimal. XGB was performed using the parameters with a maximum depth of 6 and a minimum child weight of 1. For NN, the hyper-parameters were changed during training to obtain the optimal model based on the validation set. The final selected hyper-parameters were a learning rate of 0.001, epoch of 300, hidden layer of 3, dropout rate of 0.3, and batch size of 128. For RF, a relatively large number of randomly selected 61 subtrees provided the best performance.</p>
</sec>
<sec id="s3_3">
<title>Performance of Developed Models</title>
<p>The average ROC curves and PR curves during the training are shown in <xref ref-type="fig" rid="f2">
<bold>Figure 2</bold>
</xref>. Most models had AUC values above 0.81, but the values of LR, XGB, and SVM were lower. The confusion matrix was also calculated for the seven models (<xref ref-type="table" rid="T2">
<bold>Table 2</bold>
</xref>). As shown in <xref ref-type="table" rid="T2">
<bold>Table 2</bold>
</xref>, LR, XGB, and SVM generated a large number of FNs, and kNN and CART models had a large number of FPs during the prediction process. The RF model produced the minimum number of FN (=&#x2009;5) and FP (=&#x2009;171). <xref ref-type="table" rid="T3">
<bold>Table 3</bold>
</xref> shows the AUC, sensitivity, specificity, precision, negative predictive value (NPV), false discovery rate (FDR), accuracy, AP, F1, and Matthews correlation coefficient of each model. The linear model LR showed the worst performance; its accuracy rate was up to 0.60, whereas the accuracy of RF was up to 0.96.</p>
<fig id="f2" position="float">
<label>Figure 2</label>
<caption>
<p>Evaluation of the predictive models. <bold>(A)</bold> Average ROC curves of seven models. <bold>(B)</bold> Average PR curves, indicating the tradeoff between precision and recall.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fonc-11-614398-g002.tif"/>
</fig>
<table-wrap id="T2" position="float">
<label>Table 2</label>
<caption>
<p>Confusion matrices of developed models.</p>
</caption>
<table frame="hsides">
<thead>
<tr>
<th valign="top" align="left">Confusion matrix</th>
<th valign="top" align="center">
</th>
<th valign="top" align="center">
</th>
<th valign="top" align="center">
</th>
</tr>
<tr>
<th valign="top" align="left"/>
<th valign="top" align="center">Actual</th>
<th valign="top" colspan="2" align="center">Prediction</th>
</tr>
<tr>
<th valign="top" align="left">
</th>
<th valign="top" align="center">
</th>
<th valign="top" align="center">LNM (-)</th>
<th valign="top" align="center">LNM (+)</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">LR</td>
<td valign="top" align="left">LNM (-)</td>
<td valign="top" align="center">1903</td>
<td valign="top" align="center">516</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">LNM (+)</td>
<td valign="top" align="center">1240</td>
<td valign="top" align="center">696</td>
</tr>
<tr>
<td valign="top" align="left">XGB</td>
<td valign="top" align="left">LNM (-)</td>
<td valign="top" align="center">2163</td>
<td valign="top" align="center">256</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">LNM (+)</td>
<td valign="top" align="center">1468</td>
<td valign="top" align="center">468</td>
</tr>
<tr>
<td valign="top" align="left">kNN</td>
<td valign="top" align="left">LNM (-)</td>
<td valign="top" align="center">1907</td>
<td valign="top" align="center">512</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">LNM (+)</td>
<td valign="top" align="center">18</td>
<td valign="top" align="center">1918</td>
</tr>
<tr>
<td valign="top" align="left">CART</td>
<td valign="top" align="left">LNM (-)</td>
<td valign="top" align="center">1907</td>
<td valign="top" align="center">512</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">LNM (+)</td>
<td valign="top" align="center">18</td>
<td valign="top" align="center">1918</td>
</tr>
<tr>
<td valign="top" align="left">SVM</td>
<td valign="top" align="left">LNM (-)</td>
<td valign="top" align="center">1898</td>
<td valign="top" align="center">521</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">LNM (+)</td>
<td valign="top" align="center">1053</td>
<td valign="top" align="center">883</td>
</tr>
<tr>
<td valign="top" align="left">NN</td>
<td valign="top" align="left">LNM (-)</td>
<td valign="top" align="center">1995</td>
<td valign="top" align="center">424</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">LNM (+)</td>
<td valign="top" align="center">304</td>
<td valign="top" align="center">1632</td>
</tr>
<tr>
<td valign="top" align="left">RF</td>
<td valign="top" align="left">LNM (-)</td>
<td valign="top" align="center">2248</td>
<td valign="top" align="center">171</td>
</tr>
<tr>
<td valign="top" align="left">&#x3000;</td>
<td valign="top" align="left">LNM (+)</td>
<td valign="top" align="center">5</td>
<td valign="top" align="center">1931</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>LR, logistic regression; XGB, XGBoost, kNN, k-nearest neighbor; CART, classification and regression trees model; SVM, support vector machine; NN, neural network; RF, random forest.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap id="T3" position="float">
<label>Table 3</label>
<caption>
<p>Performance of developed models.</p>
</caption>
<table frame="hsides">
<thead>
<tr>
<th valign="top" align="left">
</th>
<th valign="top" rowspan="2" align="center">AUC</th>
<th valign="top" rowspan="2" align="center">Sensitivity</th>
<th valign="top" rowspan="2" align="center">Specificity</th>
<th valign="top" rowspan="2" align="center">Precision</th>
<th valign="top" rowspan="2" align="center">NPV</th>
<th valign="top" rowspan="2" align="center">FDR</th>
<th valign="top" rowspan="2" align="center">Accuracy</th>
<th valign="top" rowspan="2" align="center">AP</th>
<th valign="top" rowspan="2" align="center">F1 Score</th>
<th valign="top" rowspan="2" align="center">Matthews correlation coefficient</th>
</tr>
<tr>
<th valign="top" align="left">Models</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">LR</td>
<td valign="top" align="center">0.623</td>
<td valign="top" align="center">0.360</td>
<td valign="top" align="center">0.787</td>
<td valign="top" align="center">0.574</td>
<td valign="top" align="center">0.606</td>
<td valign="top" align="center">0.426</td>
<td valign="top" align="center">0.597</td>
<td valign="top" align="center">0.666</td>
<td valign="top" align="center">0.442</td>
<td valign="top" align="center">0.162</td>
</tr>
<tr>
<td valign="top" align="left">XGB</td>
<td valign="top" align="center">0.659</td>
<td valign="top" align="center">0.242</td>
<td valign="top" align="center">0.894</td>
<td valign="top" align="center">0.646</td>
<td valign="top" align="center">0.596</td>
<td valign="top" align="center">0.354</td>
<td valign="top" align="center">0.604</td>
<td valign="top" align="center">0.700</td>
<td valign="top" align="center">0.352</td>
<td valign="top" align="center">0.181</td>
</tr>
<tr>
<td valign="top" align="left">kNN</td>
<td valign="top" align="center">0.933</td>
<td valign="top" align="center">0.991</td>
<td valign="top" align="center">0.788</td>
<td valign="top" align="center">0.789</td>
<td valign="top" align="center">0.991</td>
<td valign="top" align="center">0.211</td>
<td valign="top" align="center">0.878</td>
<td valign="top" align="center">0.966</td>
<td valign="top" align="center">0.879</td>
<td valign="top" align="center">0.780</td>
</tr>
<tr>
<td valign="top" align="left">CART</td>
<td valign="top" align="center">0.944</td>
<td valign="top" align="center">0.991</td>
<td valign="top" align="center">0.788</td>
<td valign="top" align="center">0.789</td>
<td valign="top" align="center">0.991</td>
<td valign="top" align="center">0.211</td>
<td valign="top" align="center">0.878</td>
<td valign="top" align="center">0.972</td>
<td valign="top" align="center">0.879</td>
<td valign="top" align="center">0.780</td>
</tr>
<tr>
<td valign="top" align="left">SVM</td>
<td valign="top" align="center">0.682</td>
<td valign="top" align="center">0.456</td>
<td valign="top" align="center">0.785</td>
<td valign="top" align="center">0.629</td>
<td valign="top" align="center">0.643</td>
<td valign="top" align="center">0.371</td>
<td valign="top" align="center">0.639</td>
<td valign="top" align="center">0.717</td>
<td valign="top" align="center">0.529</td>
<td valign="top" align="center">0.256</td>
</tr>
<tr>
<td valign="top" align="left">NN</td>
<td valign="top" align="center">0.910</td>
<td valign="top" align="center">0.843</td>
<td valign="top" align="center">0.825</td>
<td valign="top" align="center">0.794</td>
<td valign="top" align="center">0.868</td>
<td valign="top" align="center">0.206</td>
<td valign="top" align="center">0.833</td>
<td valign="top" align="center">0.841</td>
<td valign="top" align="center">0.818</td>
<td valign="top" align="center">0.665</td>
</tr>
<tr>
<td valign="top" align="left">RF</td>
<td valign="top" align="center">0.991</td>
<td valign="top" align="center">0.997</td>
<td valign="top" align="center">0.929</td>
<td valign="top" align="center">0.919</td>
<td valign="top" align="center">0.998</td>
<td valign="top" align="center">0.081</td>
<td valign="top" align="center">0.960</td>
<td valign="top" align="center">0.995</td>
<td valign="top" align="center">0.956</td>
<td valign="top" align="center">0.922</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>AUC, area under curve; NPV, negative predictive value; FDR, false discovery rate; AP, average precision; LR, logistic regression; XGB, XGBoost, kNN, k-nearest neighbor; CART, classification and regression trees model; SVM, support vector machine; NN, neural network; RF, random forest.</p>
</table-wrap-foot>
</table-wrap>
<p>The accuracy of the other models was less than 0.90. RF achieved the highest AUC value of 0.991, and CART had an AU-ROC value of 0.944. LR had the lowest AUC value of 0.623. The RF model showed the best sensitivity and specificity, as well as the best precision, NPV, FDR, accuracy, AP score, F1 score and Matthews correlation coefficient value.</p>
</sec>
<sec id="s3_4">
<title>Feature Importance Comparisons between Algorithms</title>
<p>We quantified the variable importance using the coefficients of permutation importance for LNM in each model (<xref ref-type="fig" rid="f3">
<bold>Figure 3</bold>
</xref>). For most of the models, the variables including tumor grade, depth of tumor, and age had important influences on the predictability for LNM in early CRC. Based on our quantification, tumor size showed the highest frequency for the top predictors in four of the six models.</p>
<fig id="f3" position="float">
<label>Figure 3</label>
<caption>
<p>Factor importance of the developed models. The <bold>(A&#x2013;F)</bold> Bar graphs describe the proportion of importance of the different predictors in the model.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fonc-11-614398-g003.tif"/>
</fig>
</sec>
</sec>
<sec id="s4" sec-type="discussion">
<title>Discussion</title>
<p>In this study, we established a novel predictive model by combining eight clinicopathologic parameters to predict LNM in early CRC using seven ML models. To the best of our knowledge, this is the first large-scale study to develop a predictive model for LNM by combining easily available simple clinical and pathological data in patients with early CRC. Clinicians are often confronted with the difficulty of selecting candidates who will benefit from surgery after local endoscopic resection.</p>
<p>Currently, in clinical practice, risk stratification in these patients is usually performed by histopathologists carefully analyzing the specimen to determine the risk of LNM, caused by the limited capacity of CT to accurately identify LNM (<xref ref-type="bibr" rid="B23">23</xref>).</p>
<p>In previous studies, the pathological factors that showed the strongest independent predictive value for LNM in early CRC are tumor type, poor histological differentiation, and the depth of submucosal invasion (<xref ref-type="bibr" rid="B24">24</xref>&#x2013;<xref ref-type="bibr" rid="B27">27</xref>). However, the high interobserver variability in the pathological assessment limits their clinical usefulness and should therefore be interpreted with caution as a univariate marker when deciding whether to proceed with surgery (<xref ref-type="bibr" rid="B28">28</xref>, <xref ref-type="bibr" rid="B29">29</xref>). Therefore, the multivariable risk model combining the histopathological data with clinical data can reduce the inaccuracies associated with relying on individual subjective markers and to better define the optimal treatment strategy for early CRC.</p>
<p>With the recent rapid development of computer-aided technology, the application of ML model in cancer diagnosis has an important role; it is being widely used in the medical field with growing trend toward predictive medicine (<xref ref-type="bibr" rid="B30">30</xref>&#x2013;<xref ref-type="bibr" rid="B32">32</xref>). We hereby developed an ML model by using the simple clinicopathological parameters in large data, which provided high predictive ability of LNM for patients with early CRC.</p>
<p>To date, a few ML models for prediction of metastasis in patients with early CRC have been developed and evaluated for prognosis and prediction in patients with early CRC (<xref ref-type="bibr" rid="B33">33</xref>&#x2013;<xref ref-type="bibr" rid="B36">36</xref>). Ichimasa et&#xa0;al. developed the SVM model with 45 clinicopathologic factors for prediction of LNM in patients with early CRC. They reported that artificial intelligence significantly reduces unnecessary extra surgery after endoscopic resection of T1 CRC without LNM positive in comparison to the current guidelines (<xref ref-type="bibr" rid="B33">33</xref>). Another Japanese study showed a deep learning model for predicting LNM from pathology images with cytokeratin immunohistochemistry in early CRC (<xref ref-type="bibr" rid="B34">34</xref>). However, these studies were retrospective in nature with single center or small numbers of patients. Due to the low rate of metastasis in early CRC, only a limited number of events exist, leading to limited data. Furthermore, inadequate data could not provide sufficient satisfactory performance under ML algorithms and may have led to lower predictive performance ranging from 0.821 to 0.913, which is less than the result from our RF model. A&#xa0;recent Chinese study also presented a predicting model for LNM that incorporates both the radiomics signature, which combine multiple individual CT imaging features, and several clinical factors using the multivariable logistic regression analysis (<xref ref-type="bibr" rid="B35">35</xref>). Although this might be an interesting attempt, the model validity is not guaranteed considering the heterogeneity in the quality of CT image between facilities and its accuracy of approximately 78%, which is lower than the performance of the predictive model we constructed. Lastly, Kudo et&#xa0;al. also employed deep-learning-based modeling to predict LNM in T1 CRC (<xref ref-type="bibr" rid="B36">36</xref>). However, they only used NN model for nonlinear dynamic system with smaller sample size than our study and assessed LNM using only CT imaging in the cases treated by endoscopic resection, because pathologic confirmation was not available.</p>
<p>Meanwhile, the reason why the RF model outperforms the other ML algorithms is not easily explained. It might be attributed to that the RF models generally demonstrate the most substantial improvement over linear methods and, might be outperform kernel-based model and neural network model in many categorical variables and some outliers from the nature of large retrospective cohort data. However, to build robust prognostic models for LNM in early CRC, other variables, such as gene expression and histologic image data beyond clinical-pathological variables, should be needed.</p>
<p>In our study, we investigated the variable importance of the predictive models developed, as it could be useful for decision-making by clinicians. Our findings indicated that tumor size was the most important factor for predicting the presence of LNM in early CRC. The prognostic value of tumor size in CRC has long been studied, but no consensus has been reached. Zhang et&#xa0;al. and Kornprat et&#xa0;al. demonstrated a significant association between tumor size and metastasis in CRC (<xref ref-type="bibr" rid="B37">37</xref>, <xref ref-type="bibr" rid="B38">38</xref>), whereas Miller et&#xa0;al. indicated no prognostic significance of tumor size in CRC (<xref ref-type="bibr" rid="B39">39</xref>). Furthermore, its potential prognostic role in patients with early CRC has not been well investigated. This is the first largest study to identify the prognostic value of tumor size for early CRC and provide statistical evidence for further prospective study. Despite the aforementioned, the current study has several limitations. First, since the SEER database is a nationwide program, several diagnostic criteria, such as histological grades and verification of tumor locations might be subjective, which could cause potential systematic bias. Second, detailed histopathological data, such as lymphovascular invasion, tumor budding, and precise depth of tumor invasion that have been associated with metastasis are insufficient. These data require further assessment to improve the performance of our ML algorithms. Third, our study comprised predominately of white patients; thus, the findings may not be generalized to other racial populations. Finally, the data have a class imbalance problem between the patients with and those without LNM, which means that the rate of LNM is low in early CRC. Therefore, during the tuning process, the parameters had to be further optimizing to avoid overfitting. To further improve the accuracy of the established model, it is necessary to collect more clinical data and further optimizing the parameters are necessary in subsequent studies.</p>
<p>In conclusion, we established and compared seven models to predict metastasis in early CRC by using easily available clinical and histopathological features in real practice. The RF model, a simplified reproducible predictive model, showed the highest predictive power compared with the other models. Tumor size the most important predictor of LNM in early CRC. Therefore, the patients with tumor larger than 3&#xa0;cm, who were identified as high-risk through the model, may requires careful attention to selection and additional surgical treatment in early CRC. However, because of the limitations inherent in studies based on observational data, these findings should be confirmed in prospective clinical trials.</p>
</sec>
<sec id="s5">
<title>Data Availability Statement</title>
<p>Publicly available datasets were analyzed in this study. This data can be found here: <uri xlink:href="https://seer.cancer.gov">https://seer.cancer.gov</uri>.</p>
</sec>
<sec id="s6">
<title>Author Contributions</title>
<p>MK designed the study. JA, JC and HS analyzed and interpreted the data and wrote the manuscript. JJ and JY supervised the project and revised the paper. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec id="s7" sec-type="funding-information">
<title>Funding</title>
<p>This research was supported by the Basic Science Research Program of the National Research Foundation of Korea (NRF), which is funded by the Korean Ministry of Science, ICT and Future Planning [grant number NRF- 2019R1C1C1003524].</p>
</sec>
<sec id="s8" sec-type="COI-statement">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1">
<label>1</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bray</surname> <given-names>F</given-names>
</name>
<name>
<surname>Ferlay</surname> <given-names>J</given-names>
</name>
<name>
<surname>Soerjomataram</surname> <given-names>I</given-names>
</name>
<name>
<surname>Siegel</surname> <given-names>RL</given-names>
</name>
<name>
<surname>Torre</surname> <given-names>LA</given-names>
</name>
<name>
<surname>Jemal</surname> <given-names>A</given-names>
</name>
</person-group>. <article-title>Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries</article-title>. <source>CA Cancer J Clin</source> (<year>2018</year>) <volume>68</volume>:<fpage>394</fpage>&#x2013;<lpage>424</lpage>. doi: <pub-id pub-id-type="doi">10.3322/caac.21492</pub-id>
</citation>
</ref>
<ref id="B2">
<label>2</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ferlay</surname> <given-names>J</given-names>
</name>
<name>
<surname>Shin</surname> <given-names>HR</given-names>
</name>
<name>
<surname>Bray</surname> <given-names>F</given-names>
</name>
<name>
<surname>Forman</surname> <given-names>D</given-names>
</name>
<name>
<surname>Mathers</surname> <given-names>C</given-names>
</name>
<name>
<surname>Parkin</surname> <given-names>DM</given-names>
</name>
</person-group>. <article-title>Estimates of worldwide burden of cancer in 2008: GLOBOCAN 2008</article-title>. <source>Int J Cancer</source> (<year>2010</year>) <volume>127</volume>:<page-range>2893&#x2013;917</page-range>. doi: <pub-id pub-id-type="doi">10.1002/ijc.25516</pub-id>
</citation>
</ref>
<ref id="B3">
<label>3</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Logan</surname> <given-names>RF</given-names>
</name>
<name>
<surname>Patnick</surname> <given-names>J</given-names>
</name>
<name>
<surname>Nickerson</surname> <given-names>C</given-names>
</name>
<name>
<surname>Coleman</surname> <given-names>L</given-names>
</name>
<name>
<surname>Rutter</surname> <given-names>MD</given-names>
</name>
<name>
<surname>Von Wagner</surname> <given-names>C</given-names>
</name>
<etal/>
</person-group>. <article-title>Outcomes of the Bowel Cancer Screening Programme (BCSP) in England after the first 1 million tests</article-title>. <source>Gut</source> (<year>2012</year>) <volume>61</volume>:<page-range>1439&#x2013;46</page-range>. doi: <pub-id pub-id-type="doi">10.1136/gutjnl-2011-300843</pub-id>
</citation>
</ref>
<ref id="B4">
<label>4</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Inadomi</surname> <given-names>JM</given-names>
</name>
</person-group>. <article-title>Screening for Colorectal Neoplasia</article-title>. <source>N Engl J Med</source> (<year>2017</year>) <volume>376</volume>:<page-range>149&#x2013;56</page-range>. doi: <pub-id pub-id-type="doi">10.1056/NEJMcp1512286</pub-id>
</citation>
</ref>
<ref id="B5">
<label>5</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Seitz</surname> <given-names>U</given-names>
</name>
<name>
<surname>Bohnacker</surname> <given-names>S</given-names>
</name>
<name>
<surname>Seewald</surname> <given-names>S</given-names>
</name>
<name>
<surname>Thonke</surname> <given-names>F</given-names>
</name>
<name>
<surname>Brand</surname> <given-names>B</given-names>
</name>
<name>
<surname>Braiutigam</surname> <given-names>T</given-names>
</name>
<etal/>
</person-group>. <article-title>Is endoscopic polypectomy an adequate therapy for malignant colorectal adenomas? Presentation of 114 patients and review of the literature</article-title>. <source>Dis&#xa0;Colon Rectum</source> (<year>2004</year>) <volume>47</volume>:<fpage>1789</fpage>&#x2013;<lpage>96; discussion 1796&#x2013;87</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s10350-004-0680-2</pub-id>
</citation>
</ref>
<ref id="B6">
<label>6</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kawamura</surname> <given-names>YJ</given-names>
</name>
<name>
<surname>Sugamata</surname> <given-names>Y</given-names>
</name>
<name>
<surname>Yoshino</surname> <given-names>K</given-names>
</name>
<name>
<surname>Abo</surname> <given-names>Y</given-names>
</name>
<name>
<surname>Nara</surname> <given-names>S</given-names>
</name>
<name>
<surname>Sumita</surname> <given-names>T</given-names>
</name>
<etal/>
</person-group>. <article-title>Endoscopic resection for submucosally invasive colorectal cancer: is it feasible</article-title>? <source>Surg Endosc</source> (<year>1999</year>) <volume>13</volume>:<page-range>224&#x2013;7</page-range>. doi: <pub-id pub-id-type="doi">10.1007/s004649900949</pub-id>
</citation>
</ref>
<ref id="B7">
<label>7</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kashida</surname> <given-names>H</given-names>
</name>
<name>
<surname>Kudo</surname> <given-names>SE</given-names>
</name>
</person-group>. <article-title>Early colorectal cancer: concept, diagnosis, and management</article-title>. <source>Int J Clin Oncol</source> (<year>2006</year>) <volume>11</volume>:<fpage>1</fpage>&#x2013;<lpage>8</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s10147-005-0550-5</pub-id>
</citation>
</ref>
<ref id="B8">
<label>8</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ricciardi</surname> <given-names>R</given-names>
</name>
<name>
<surname>Madoff</surname> <given-names>RD</given-names>
</name>
<name>
<surname>Rothenberger</surname> <given-names>DA</given-names>
</name>
<name>
<surname>Baxter</surname> <given-names>NN</given-names>
</name>
</person-group>. <article-title>Population-based analyses of lymph node metastases in colorectal cancer</article-title>. <source>Clin Gastroenterol Hepatol</source> (<year>2006</year>) <volume>4</volume>:<page-range>1522&#x2013;7</page-range>. doi: <pub-id pub-id-type="doi">10.1016/j.cgh.2006.07.016</pub-id>
</citation>
</ref>
<ref id="B9">
<label>9</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tominaga</surname> <given-names>K</given-names>
</name>
<name>
<surname>Nakanishi</surname> <given-names>Y</given-names>
</name>
<name>
<surname>Nimura</surname> <given-names>S</given-names>
</name>
<name>
<surname>Yoshimura</surname> <given-names>K</given-names>
</name>
<name>
<surname>Sakai</surname> <given-names>Y</given-names>
</name>
<name>
<surname>Shimoda</surname> <given-names>T</given-names>
</name>
</person-group>. <article-title>Predictive histopathologic factors for lymph node metastasis in patients with nonpedunculated submucosal invasive colorectal carcinoma</article-title>. <source>Dis Colon Rectum</source> (<year>2005</year>) <volume>48</volume>:<fpage>92</fpage>&#x2013;<lpage>100</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s10350-004-0751-4</pub-id>
</citation>
</ref>
<ref id="B10">
<label>10</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bayar</surname> <given-names>S</given-names>
</name>
<name>
<surname>Saxena</surname> <given-names>R</given-names>
</name>
<name>
<surname>Emir</surname> <given-names>B</given-names>
</name>
<name>
<surname>Salem</surname> <given-names>RR</given-names>
</name>
</person-group>. <article-title>Venous invasion may predict lymph node metastasis in early rectal cancer</article-title>. <source>Eur J Surg Oncol</source> (<year>2002</year>) <volume>28</volume>:<page-range>413&#x2013;7</page-range>. doi: <pub-id pub-id-type="doi">10.1053/ejso.2002.1254</pub-id>
</citation>
</ref>
<ref id="B11">
<label>11</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Daly</surname> <given-names>MC</given-names>
</name>
<name>
<surname>Paquette</surname> <given-names>IM</given-names>
</name>
</person-group>. <article-title>Surveillance, Epidemiology, and End Results (SEER) and SEER-Medicare Databases: Use in Clinical Research for Improving Colorectal Cancer Outcomes</article-title>. <source>Clin Colon Rectal Surg</source> (<year>2019</year>) <volume>32</volume>:<page-range>61&#x2013;8</page-range>. doi: <pub-id pub-id-type="doi">10.1055/s-0038-1673355</pub-id>
</citation>
</ref>
<ref id="B12">
<label>12</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Weiss</surname> <given-names>JM</given-names>
</name>
<name>
<surname>Pfau</surname> <given-names>PR</given-names>
</name>
<name>
<surname>O&#x2019;connor</surname> <given-names>ES</given-names>
</name>
<name>
<surname>King</surname> <given-names>J</given-names>
</name>
<name>
<surname>Loconte</surname> <given-names>N</given-names>
</name>
<name>
<surname>Kennedy</surname> <given-names>G</given-names>
</name>
<etal/>
</person-group>. <article-title>Mortality by stage for right- versus left-sided colon cancer: analysis of surveillance, epidemiology, and end results&#x2013;Medicare data</article-title>. <source>J Clin Oncol</source> (<year>2011</year>) <volume>29</volume>:<page-range>4401&#x2013;9</page-range>. doi: <pub-id pub-id-type="doi">10.1200/JCO.2011.36.4414</pub-id>
</citation>
</ref>
<ref id="B13">
<label>13</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nagtegaal</surname> <given-names>ID</given-names>
</name>
<name>
<surname>Odze</surname> <given-names>RD</given-names>
</name>
<name>
<surname>Klimstra</surname> <given-names>D</given-names>
</name>
<name>
<surname>Paradis</surname> <given-names>V</given-names>
</name>
<name>
<surname>Rugge</surname> <given-names>M</given-names>
</name>
<name>
<surname>Schirmacher</surname> <given-names>P</given-names>
</name>
<etal/>
</person-group>. <article-title>The 2019 WHO classification of tumours of the digestive system</article-title>. <source>Histopathology</source> (<year>2020</year>) <volume>76</volume>:<page-range>182&#x2013;8</page-range>. doi: <pub-id pub-id-type="doi">10.1111/his.13975</pub-id>
</citation>
</ref>
<ref id="B14">
<label>14</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Menard</surname> <given-names>S</given-names>
</name>
</person-group>. <source>Applied logistic regression analysis</source>. <publisher-name>SAGE Publications</publisher-name> (<year>2002</year>). doi: <pub-id pub-id-type="doi">10.4135/9781412983433</pub-id>
</citation>
</ref>
<ref id="B15">
<label>15</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Nigrin</surname> <given-names>A</given-names>
</name>
</person-group>. <source>Neural networks for pattern recognition</source>. <publisher-name>MIT press ACM SIGART Bulletin</publisher-name> (<year>1993</year>). doi: <pub-id pub-id-type="doi">10.1145/182053.1064827</pub-id>
</citation>
</ref>
<ref id="B16">
<label>16</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cortes</surname> <given-names>C</given-names>
</name>
<name>
<surname>Vapnik</surname> <given-names>V</given-names>
</name>
</person-group>. <article-title>Support-vector networks</article-title>. <source>Mach Learn</source> (<year>1995</year>) <volume>20</volume>:<page-range>273&#x2013;97</page-range>. doi: <pub-id pub-id-type="doi">10.1007/BF00994018</pub-id>
</citation>
</ref>
<ref id="B17">
<label>17</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Breiman</surname> <given-names>L</given-names>
</name>
</person-group>. <article-title>Random forests</article-title>. <source>Mach Learn</source> (<year>2001</year>) <volume>45</volume>:<fpage>5</fpage>&#x2013;<lpage>32</lpage>. doi: <pub-id pub-id-type="doi">10.1023/A:1010933404324</pub-id>
</citation>
</ref>
<ref id="B18">
<label>18</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname> <given-names>H</given-names>
</name>
<name>
<surname>Guo</surname> <given-names>H</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>J</given-names>
</name>
</person-group>. <article-title>Research on type 2 diabetes mellitus precise prediction models based on XGBoost algorithm</article-title>. <source>Chin J Lab Diagn</source> (<year>2018</year>) <volume>22</volume>:<page-range>408&#x2013;12</page-range>. doi: <pub-id pub-id-type="doi">10.3969/j.issn.1007-4287.2018.03.008</pub-id>
</citation>
</ref>
<ref id="B19">
<label>19</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Marshall</surname> <given-names>RJ</given-names>
</name>
</person-group>. <article-title>The use of classification and regression trees in clinical epidemiology</article-title>. <source>J Clin Epidemiol</source> (<year>2001</year>) <volume>54</volume>:<page-range>603&#x2013;9</page-range>. doi: <pub-id pub-id-type="doi">10.1016/S0895-4356(00)00344-9</pub-id>
</citation>
</ref>
<ref id="B20">
<label>20</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Altman</surname> <given-names>NS</given-names>
</name>
</person-group>. <article-title>An introduction to kernel and nearest-neighbor nonparametric regression</article-title>. <source>Am Stat</source> (<year>1992</year>) <volume>46</volume>:<page-range>175&#x2013;85</page-range>. doi: <pub-id pub-id-type="doi">10.1080/00031305.1992.10475879</pub-id>
</citation>
</ref>
<ref id="B21">
<label>21</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lema&#xee;tre</surname> <given-names>G</given-names>
</name>
<name>
<surname>Nogueira</surname> <given-names>F</given-names>
</name>
<name>
<surname>Aridas</surname> <given-names>CK</given-names>
</name>
</person-group>. <article-title>Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning</article-title>. <source>J Mach Learn Res</source> (<year>2017</year>) <volume>18</volume>:<page-range>559&#x2013;63</page-range>. doi: arXiv:1609.06570</citation>
</ref>
<ref id="B22">
<label>22</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bewick</surname> <given-names>V</given-names>
</name>
<name>
<surname>Cheek</surname> <given-names>L</given-names>
</name>
<name>
<surname>Ball</surname> <given-names>J</given-names>
</name>
</person-group>. <article-title>Statistics review 13: receiver operating characteristic curves</article-title>. <source>Crit Care</source> (<year>2004</year>) <volume>8</volume>:<fpage>508</fpage>. doi: <pub-id pub-id-type="doi">10.1186/cc3000</pub-id>
</citation>
</ref>
<ref id="B23">
<label>23</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dighe</surname> <given-names>S</given-names>
</name>
<name>
<surname>Purkayastha</surname> <given-names>S</given-names>
</name>
<name>
<surname>Swift</surname> <given-names>I</given-names>
</name>
<name>
<surname>Tekkis</surname> <given-names>PP</given-names>
</name>
<name>
<surname>Darzi</surname> <given-names>A</given-names>
</name>
<name>
<surname>A&#x2019;hern</surname> <given-names>R</given-names>
</name>
<etal/>
</person-group>. <article-title>Diagnostic precision of CT in local staging of colon cancers: a meta-analysis</article-title>. <source>Clin Radiol</source> (<year>2010</year>) <volume>65</volume>:<page-range>708&#x2013;19</page-range>. doi: <pub-id pub-id-type="doi">10.1016/j.crad.2010.01.024</pub-id>
</citation>
</ref>
<ref id="B24">
<label>24</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xu</surname> <given-names>F</given-names>
</name>
<name>
<surname>Xu</surname> <given-names>J</given-names>
</name>
<name>
<surname>Lou</surname> <given-names>Z</given-names>
</name>
<name>
<surname>Di</surname> <given-names>M</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>F</given-names>
</name>
<name>
<surname>Hu</surname> <given-names>H</given-names>
</name>
<etal/>
</person-group>. <article-title>Micropapillary component in colorectal carcinoma is associated with lymph node metastasis in T1 and T2 Stages and decreased survival time in TNM stages I and II</article-title>. <source>Am J Surg Pathol</source> (<year>2009</year>) <volume>33</volume>:<page-range>1287&#x2013;92</page-range>. doi: <pub-id pub-id-type="doi">10.1097/PAS.0b013e3181a5387b</pub-id>
</citation>
</ref>
<ref id="B25">
<label>25</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bosch</surname> <given-names>SL</given-names>
</name>
<name>
<surname>Teerenstra</surname> <given-names>S</given-names>
</name>
<name>
<surname>De Wilt</surname> <given-names>JH</given-names>
</name>
<name>
<surname>Cunningham</surname> <given-names>C</given-names>
</name>
<name>
<surname>Nagtegaal</surname> <given-names>ID</given-names>
</name>
</person-group>. <article-title>Predicting lymph node metastasis in pT1 colorectal cancer: a systematic review of risk factors providing rationale for therapy decisions</article-title>. <source>Endoscopy</source> (<year>2013</year>) <volume>45</volume>:<page-range>827&#x2013;41</page-range>. doi: <pub-id pub-id-type="doi">10.1055/s-0033-1344238</pub-id>
</citation>
</ref>
<ref id="B26">
<label>26</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Glasgow</surname> <given-names>SC</given-names>
</name>
<name>
<surname>Bleier</surname> <given-names>JI</given-names>
</name>
<name>
<surname>Burgart</surname> <given-names>LJ</given-names>
</name>
<name>
<surname>Finne</surname> <given-names>CO</given-names>
</name>
<name>
<surname>Lowry</surname> <given-names>AC</given-names>
</name>
</person-group>. <article-title>Meta-analysis of histopathological features of primary colorectal cancers that predict lymph node metastases</article-title>. <source>J Gastrointestinal Surg</source> (<year>2012</year>) <volume>16</volume>:<page-range>1019&#x2013;28</page-range>. doi: <pub-id pub-id-type="doi">10.1007/s11605-012-1827-4</pub-id>
</citation>
</ref>
<ref id="B27">
<label>27</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sun</surname> <given-names>Z-Q</given-names>
</name>
<name>
<surname>Ma</surname> <given-names>S</given-names>
</name>
<name>
<surname>Zhou</surname> <given-names>Q-B</given-names>
</name>
<name>
<surname>Yang</surname> <given-names>S-X</given-names>
</name>
<name>
<surname>Chang</surname> <given-names>Y</given-names>
</name>
<name>
<surname>Zeng</surname> <given-names>X-Y</given-names>
</name>
<etal/>
</person-group>. <article-title>Prognostic value of lymph node metastasis in patients with T1-stage colorectal cancer from multiple centers in China</article-title>. <source>World J Gastroenterol</source> (<year>2017</year>) <volume>23</volume>:<fpage>8582</fpage>. doi: <pub-id pub-id-type="doi">10.3748/wjg.v23.i48.8582</pub-id>
</citation>
</ref>
<ref id="B28">
<label>28</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brockmoeller</surname> <given-names>SF</given-names>
</name>
<name>
<surname>West</surname> <given-names>NP</given-names>
</name>
</person-group>. <article-title>Predicting systemic spread in early colorectal cancer: Can we do better</article-title>? <source>World J Gastroenterol</source> (<year>2019</year>) <volume>25</volume>:<page-range>2887&#x2013;97</page-range>. doi: <pub-id pub-id-type="doi">10.3748/wjg.v25.i23.2887</pub-id>
</citation>
</ref>
<ref id="B29">
<label>29</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Harris</surname> <given-names>EI</given-names>
</name>
<name>
<surname>Lewin</surname> <given-names>DN</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>HL</given-names>
</name>
<name>
<surname>Lauwers</surname> <given-names>GY</given-names>
</name>
<name>
<surname>Srivastava</surname> <given-names>A</given-names>
</name>
<name>
<surname>Shyr</surname> <given-names>Y</given-names>
</name>
<etal/>
</person-group>. <article-title>Lymphovascular invasion in colorectal cancer: an interobserver variability study</article-title>. <source>Am J Surg Pathol</source> (<year>2008</year>) <volume>32</volume>:<fpage>1816</fpage>. doi: <pub-id pub-id-type="doi">10.1097/PAS.0b013e3181816083</pub-id>
</citation>
</ref>
<ref id="B30">
<label>30</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kourou</surname> <given-names>K</given-names>
</name>
<name>
<surname>Exarchos</surname> <given-names>TP</given-names>
</name>
<name>
<surname>Exarchos</surname> <given-names>KP</given-names>
</name>
<name>
<surname>Karamouzis</surname> <given-names>MV</given-names>
</name>
<name>
<surname>Fotiadis</surname> <given-names>DI</given-names>
</name>
</person-group>. <article-title>Machine learning applications in cancer prognosis and prediction</article-title>. <source>Comput Struct Biotechnol J</source> (<year>2015</year>) <volume>13</volume>:<fpage>8</fpage>&#x2013;<lpage>17</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.csbj.2014.11.005</pub-id>
</citation>
</ref>
<ref id="B31">
<label>31</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Statnikov</surname> <given-names>A</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>L</given-names>
</name>
<name>
<surname>Aliferis</surname> <given-names>CF</given-names>
</name>
</person-group>. <article-title>A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification</article-title>. <source>BMC Bioinf</source> (<year>2008</year>) <volume>9</volume>:<fpage>319</fpage>. doi: <pub-id pub-id-type="doi">10.1186/1471-2105-9-319</pub-id>
</citation>
</ref>
<ref id="B32">
<label>32</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xiao</surname> <given-names>Y</given-names>
</name>
<name>
<surname>Wu</surname> <given-names>J</given-names>
</name>
<name>
<surname>Lin</surname> <given-names>Z</given-names>
</name>
<name>
<surname>Zhao</surname> <given-names>X</given-names>
</name>
</person-group>. <article-title>A deep learning-based multi-model ensemble method for cancer prediction</article-title>. <source>Comput Methods Programs Biomed</source> (<year>2018</year>) <volume>153</volume>:<fpage>1</fpage>&#x2013;<lpage>9</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.cmpb.2017.09.005</pub-id>
</citation>
</ref>
<ref id="B33">
<label>33</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ichimasa</surname> <given-names>K</given-names>
</name>
<name>
<surname>Kudo</surname> <given-names>S-E</given-names>
</name>
<name>
<surname>Mori</surname> <given-names>Y</given-names>
</name>
<name>
<surname>Misawa</surname> <given-names>M</given-names>
</name>
<name>
<surname>Matsudaira</surname> <given-names>S</given-names>
</name>
<name>
<surname>Kouyama</surname> <given-names>Y</given-names>
</name>
<etal/>
</person-group>. <article-title>Artificial intelligence may help in predicting the need for additional surgery after endoscopic resection of T1 colorectal cancer</article-title>. <source>Endoscopy</source> (<year>2018</year>) <volume>50</volume>:<page-range>230&#x2013;40</page-range>. doi: <pub-id pub-id-type="doi">10.1055/s-0043-122385</pub-id>
</citation>
</ref>
<ref id="B34">
<label>34</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Takamatsu</surname> <given-names>M</given-names>
</name>
<name>
<surname>Yamamoto</surname> <given-names>N</given-names>
</name>
<name>
<surname>Kawachi</surname> <given-names>H</given-names>
</name>
<name>
<surname>Chino</surname> <given-names>A</given-names>
</name>
<name>
<surname>Saito</surname> <given-names>S</given-names>
</name>
<name>
<surname>Ueno</surname> <given-names>M</given-names>
</name>
<etal/>
</person-group>. <article-title>Prediction of early colorectal cancer metastasis by machine learning using digital slide images</article-title>. <source>Comput Methods Programs Biomed</source> (<year>2019</year>) <volume>178</volume>:<page-range>155&#x2013;61</page-range>. doi: <pub-id pub-id-type="doi">10.1016/j.cmpb.2019.06.022</pub-id>
</citation>
</ref>
<ref id="B35">
<label>35</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huang</surname> <given-names>YQ</given-names>
</name>
<name>
<surname>Liang</surname> <given-names>CH</given-names>
</name>
<name>
<surname>He</surname> <given-names>L</given-names>
</name>
<name>
<surname>Tian</surname> <given-names>J</given-names>
</name>
<name>
<surname>Liang</surname> <given-names>CS</given-names>
</name>
<name>
<surname>Chen</surname> <given-names>X</given-names>
</name>
<etal/>
</person-group>. <article-title>Development and Validation of a Radiomics Nomogram for Preoperative Prediction of Lymph Node Metastasis in Colorectal Cancer</article-title>. <source>J Clin Oncol</source> (<year>2016</year>) <volume>34</volume>:<page-range>2157&#x2013;64</page-range>. doi: <pub-id pub-id-type="doi">10.1200/JCO.2015.65.9128</pub-id>
</citation>
</ref>
<ref id="B36">
<label>36</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kudo</surname> <given-names>SE</given-names>
</name>
<name>
<surname>Ichimasa</surname> <given-names>K</given-names>
</name>
<name>
<surname>Villard</surname> <given-names>B</given-names>
</name>
<name>
<surname>Mori</surname> <given-names>Y</given-names>
</name>
<name>
<surname>Misawa</surname> <given-names>M</given-names>
</name>
<name>
<surname>Saito</surname> <given-names>S</given-names>
</name>
<etal/>
</person-group>. <article-title>Artificial Intelligence System to Determine Risk of T1 Colorectal Cancer Metastasis to Lymph Node</article-title>. <source>Gastroenterology</source> (<year>2020</year>) <volume>160</volume>:<page-range>1075&#x2013;84</page-range>. doi: <pub-id pub-id-type="doi">10.1053/j.gastro.2020.09.027</pub-id>
</citation>
</ref>
<ref id="B37">
<label>37</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname> <given-names>H</given-names>
</name>
<name>
<surname>Chen</surname> <given-names>C-S</given-names>
</name>
<name>
<surname>Cong</surname> <given-names>J-C</given-names>
</name>
<name>
<surname>Qiao</surname> <given-names>L</given-names>
</name>
<name>
<surname>Hasegawa</surname> <given-names>T</given-names>
</name>
<name>
<surname>Takashima</surname> <given-names>S</given-names>
</name>
</person-group>. <article-title>Clinicopathological characteristics of advanced colorectal cancer 30 mm or smaller in diameter</article-title>. <source>Chin Med Sci J Chung-kuo i hsueh k&#x2019;o hsueh tsa chih</source> (<year>2007</year>) <volume>22</volume>:<fpage>98</fpage>&#x2013;<lpage>103</lpage>.</citation>
</ref>
<ref id="B38">
<label>38</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kornprat</surname> <given-names>P</given-names>
</name>
<name>
<surname>Pollheimer</surname> <given-names>MJ</given-names>
</name>
<name>
<surname>Lindtner</surname> <given-names>RA</given-names>
</name>
<name>
<surname>Schlemmer</surname> <given-names>A</given-names>
</name>
<name>
<surname>Rehak</surname> <given-names>P</given-names>
</name>
<name>
<surname>Langner</surname> <given-names>C</given-names>
</name>
</person-group>. <article-title>Value of tumor size as a prognostic variable in colorectal cancer: a critical reappraisal</article-title>. <source>Am J Clin Oncol</source> (<year>2011</year>) <volume>34</volume>:<page-range>43&#x2013;9</page-range>. doi: <pub-id pub-id-type="doi">10.1097/COC.0b013e3181cae8dd</pub-id>
</citation>
</ref>
<ref id="B39">
<label>39</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Miller</surname> <given-names>W</given-names>
</name>
<name>
<surname>Ota</surname> <given-names>D</given-names>
</name>
<name>
<surname>Giacco</surname> <given-names>G</given-names>
</name>
<name>
<surname>Guinee</surname> <given-names>V</given-names>
</name>
<name>
<surname>Irimura</surname> <given-names>T</given-names>
</name>
<name>
<surname>Nicolson</surname> <given-names>G</given-names>
</name>
<etal/>
</person-group>. <article-title>Absence of a relationship of size of primary colon carcinoma with metastasis and survival</article-title>. <source>Clin Exp Metastasis</source> (<year>1985</year>) <volume>3</volume>:<page-range>189&#x2013;96</page-range>. doi: <pub-id pub-id-type="doi">10.1007/BF01786762</pub-id>
</citation>
</ref>
</ref-list>
</back>
</article>