Application of Machine Learning Algorithms to Predict Central Lymph Node Metastasis in T1-T2, Non-invasive, and Clinically Node Negative Papillary Thyroid Carcinoma

Purpose: While there are no clear indications of whether central lymph node dissection is necessary in patients with T1-T2, non-invasive, clinically uninvolved central neck lymph nodes papillary thyroid carcinoma (PTC), this study seeks to develop and validate models for predicting the risk of central lymph node metastasis (CLNM) in these patients based on machine learning algorithms. Methods: This is a retrospective study comprising 1,271 patients with T1-T2 stage, non-invasive, and clinically node negative (cN0) PTC who underwent surgery at the Department of Endocrine and Breast Surgery of The First Affiliated Hospital of Chongqing Medical University from February 1, 2016, to December 31, 2018. We applied six machine learning (ML) algorithms, including Logistic Regression (LR), Gradient Boosting Machine (GBM), Extreme Gradient Boosting (XGBoost), Random Forest (RF), Decision Tree (DT), and Neural Network (NNET), coupled with preoperative clinical characteristics and intraoperative information to develop prediction models for CLNM. Among all the samples, 70% were randomly selected to train the models while the remaining 30% were used for validation. Indices like the area under the receiver operating characteristic (AUROC), sensitivity, specificity, and accuracy were calculated to test the models' performance. Results: The results showed that ~51.3% (652 out of 1,271) of the patients had pN1 disease. In multivariate logistic regression analyses, gender, tumor size and location, multifocality, age, and Delphian lymph node status were all independent predictors of CLNM. In predicting CLNM, six ML algorithms posted AUROC of 0.70–0.75, with the extreme gradient boosting (XGBoost) model standing out, registering 0.75. Thus, we employed the best-performing ML algorithm model and uploaded the results to a self-made online risk calculator to estimate an individual's probability of CLNM (https://jin63.shinyapps.io/ML_CLNM/). Conclusions: With the incorporation of preoperative and intraoperative risk factors, ML algorithms can achieve acceptable prediction of CLNM with Xgboost model performing the best. Our online risk calculator based on ML algorithm may help determine the optimal extent of initial surgical treatment for patients with T1-T2 stage, non-invasive, and clinically node negative PTC.


INTRODUCTION
Papillary thyroid carcinoma (PTC) is one of the most common type of endocrine malignancies with a favorable prognosis (1,2). Nevertheless, central lymph node metastasis (CLNM), the first station of metastasis, occurs in 30-90% of patients following their first surgery and is correlated with an increased risk of local recurrence (3,4).
The clinical community has reached a general consensus that central lymph node dissection (CLND) for therapeutic purposes is appropriate in PTC patients with suspected cervical lymph node metastasis (LNM) (5). By contrast, however, there is a growing controversy over the role of prophylactic central lymph node dissection (pCLND) due to the lack of randomized controlled data (6)(7)(8). Generally speaking, pCLND is not recommended for a subset of patients with small (T1 or T2), non-invasive, clinically node-negative (cN0) PTC according to the 2015 American Thyroid Association (ATA) guidelines (9), whereas the Japanese Society of Thyroid Surgery and the Chinese Thyroid Association both strongly recommend routine pCLND for cN0 PTC patients in order to stage disease and prevent recurrence. While an incomplete nodal resection in the first surgery may lead to disease recurrence and a second operation (10), it is also important to avoid unnecessary CLND in view of surgical complications such as hypoparathyroidism and recurrent laryngeal nerve injury. Ideal treatment decisionmaking should be based upon individual patients rather than "one size fits all" approach recommended by guidelines. This highlights the importance of accurate prediction of CLNM occurrence with a more personalized therapeutical schedule.
Machine learning (ML), as a novel type of artificial intelligence (AI), is starting to be widely applied to health-care data analysis (11,12). By capitalizing on the robust prediction ability of ML algorithms, it may be possible to develop prediction tools which in some cases outperform traditional statistical modeling, and thus giving better prediction of CLNM status. Unfortunately, no current studies have trained ML algorithms to predict CLNM in this subset of PTC.
Hence, the purpose of this study is to develop ML-based models using preoperative and intraoperative clinicopathological characteristics to predict the likelihood of CLNM for individualized treatment and to obtain the best ML algorithms for online CLNM prediction in PTC.

Study Population
We retrospectively retrieved the data of in-patients who underwent thyroid surgery at the Department of Endocrine and Breast Surgery of the First Affiliated Hospital of Chongqing Medical University from December 2016 to December 2018.

Data Collection
Criteria for inclusion were to be a PTC patient with a tumor size no larger than 40 mm (T1-T2), a non-invasive tumor, and no evidence for lymph nodes metastases (cN0) based on ultrasound (US) data. Tumor size was classified according to the 8th edition of American Joint Committee on Cancer (AJCC) Staging Standards. Criteria for exclusion were distant metastasis, previous thyroid surgery, or incomplete information. This study was approved by the local institutional ethics committee board. Demographic and clinicopathological characteristics data were collected as follows: gender, age, tumor size, tumor location, chronic lymphocytic thyroiditis (CLT), multifocality, bilaterality, and the presence of LNM.

Surgical Strategy
At our institution, it is customary to perform pCLND for PTC patients and the detailed surgical procedures were described in previous articles (13,14). Soft tissues in the prelaryngeal and pretracheal regions were removed and marked as the Delphian (Figure 1) and pretracheal LNs, respectively. Those two subgroups were sent for intraoperative frozen section examination. Then, we proceeded to perform the thyroid lobectomy and ipsilateral paratracheal LN dissection and the paratracheal LN was also sent for frozen section examination. Lastly, all surgical specimens were sent for post-operative histopathologic evaluation. The Delphian lymph node (DLN) was not taken into account in the calculation of central compartment lymph nodes.

Statistical Analyses
The Fisher's exact test and Student's t-test were used for discrete and continuous parameters, respectively. For the independent risk factors of CLNM, a multivariable logistic regression analysis with backward stepwise selection was used to calculate the odds ratios (ORs) with 95% confidence intervals (CIs).
ML algorithm is characterized by its extraordinary performance better than traditional regression approaches in predicting outcomes within large data bases (15)(16)(17). In this study, we randomly split our dataset into two groups, namely the training sets (70%) for ML model development and the validation sets (30%) for performance evaluation and we repeated this random splitting until the patient data were equally distributed in both sets (Supplementary Table 1). We developed six types of ML algorithms to model our data: Logistic regression (LR), Gradient boosting machine (GBM), Extreme gradient boosting (XGBoost), Random forest (RF), Decision tree (DT), and Neural network (NNET). In the training process, tuning was considered for ML-based models to avoid overfitting and the best hyper-parameter for ML models was 5-fold cross-validation. Then the ML algorithms were further trained by using the R software to predict the risk of CLNM and we evaluated the predictive ability of each ML classifier, with the same hyper-parameter, in validation sets where the area under the receiver operating characteristic (AUROC) value, and the corresponding sensitivity, specificity, as well as overall accuracy of ML algorithms were all calculated. In the comparison of ML algorithms' performance, the closer to 1 the AUC was, the better the classification model performed. Afterwards, based on the best-performing model, we created an online risk calculator that can make predictions with newly entered PTC patient data, and thus making the risk of CLNM in those patients easily accessible to clinicians. A total of 100 independent training simulation results were used to evaluate the variable importance of each CLNM-predicting ML model. All statistical analyses were performed by using R software, version 3.4.1 (R Foundation for Statistical Computing, Vienna, Austria). The R packages "caret, " "e1071, " "random-forest, " "nnet, " "gbm, " "rpart, " "GLM, " "pROC" were used for ML algorithms and "shiny" package for web application. A two tailed P < 0.05 was deemed statistically significant.

Demographics Features
The clinicopathological characteristics of 1,271 PTC patients with T1-T2, non-invasive, clinically node-negative disease were summarized ( Table 1). Of the 1,271 eligible patients, the average age was 42.15 ± 10.49 years (range 18-80 years). The ratio of male to female patients was 1:2.7. The mean tumor size was 9.92 mm (median = 8 mm). Eight hundred and ninety seven patients (70.6%) had papillary micro-carcinomas. Central lymph node metastases were positive in 652 (51.3%) cases.

Univariate and Multivariate Logistic Regression Analyses of CLNM
In univariable analysis, tumor size, gender, age, multifocality, bilateral lesions, and DLN status were all significantly associated with the occurrence of CLNM in overall population (P < 0.001), whereas there was no significant difference between CLNM-positive and CLNM-negative patients in terms of their tumor location or CLT status. In multivariable logistic regression analysis (

Performance of Machine Learning Algorithms
Comparisons of the performance of prediction among the six ML algorithms models in validation sets are detailed in Table 3 and Figure 1. It turned out that the XGBoost model demonstrated the highest performance of predicting CLNM, whose AUROC was 0.750, sensitivity 0.667, specificity 0.674, and accuracy 0.670 in validation sets. Accordingly, we chose the XGBoost model as the final prediction model.

Relative Importance of Variables in Machine Learning Algorithms
The relative importance of variables in each CLNM-predicting ML algorithm is shown in Figure 2. We can see there are general trends of evidence: although slight differences are shown in the importance of variables among those ML algorithms, factors including Delphian lymph node metastasis, tumor size, age, gender, multifocality rank top five without fail. On the contrary, variables like bilateral lesions, tumor location in middle or isthmus pole and CLT make little contribution to CLNM prediction. The importance of high-ranking variables in the XGBoost model is arranged as follows in a descending    order: Delphian lymph node metastasis, tumor size, age, gender, multifocality and tumor location.

Web-Based Calculator
An online calculator based on the best-performing model was established for clinicians to predict patients' risk of developing CLNM by simply inputing readily available preoperative and intraoperative clinicopathological variables (https://jin63. shinyapps.io/ML_CLNM/) (Figure 3).

DISCUSSION
In this study, we developed and validated multiple popular machine learning algorithms to predict CLNM in patients with T1-T2, non-invasive, cN0 PTC. A comparison of ML algorithms identified that the XGBoost model gave the greatest performance. To make the application of this model available, we further established an online calculator for estimating the individual probability of CLNM in this subset patients with PTC. This ML-based model may potentially guide intraoperative decision-making.
It is noteworthy that the 2015 ATA guidelines (9) asserted that "thyroidectomy without pCLND is adequate for small (T1 or T2), non-invasive, clinically node-negative PTC." Yet, the risk of metastatic lymph nodes among this subgroup is unequal and a "one-size fits all" approach may raise concerns that in the long run it would bring potentially disastrous consequences for patients exempted from pCLND. Our data demonstrate that up to 51% of patients with T1-T2, non-invasive, cN0 PTC harbored central lymph node metastases. Such a high incidence of regional lymph node involvement is similar to other findings (18)(19)(20) and indicates that thyroid cancer is predisposed to LNM and that preoperative ultrasound currently fails to detect a massive number of patients with clinically significant lymph nodal disease (21,22). Therefore, an accurate diagnosis of lymph node status carries much weight in helping clinicians determine the precise treatment for patients as well as informing the patients of prognoses and we advocate a selective approach to pCLND, particularly for cases with a high risk of CLNM.
Preoperative variables including larger tumors, younger age, male, multifocality, and tumor location in inferior portion are identified as the most important contributing predictors of CLNM-positive status by ML algorithms. The finding that younger age is highly predictive of CLNM in our research is similar to previous studies (23,24). In addition, multifocal PTCs have been shown to be prone to CLNM and our results are consistent with previous reports, suggesting that multifocality is a positive predictor of CLNM (25,26). It has been previously demonstrated by Thompson et al. (27) and Yang et al. (28) that larger tumors are significantly associated with an increased risk of nodal spread while we have found that rates of lymph node involvement surge in tumor sizes > 20 mm, compared with those in tumor sizes of 10-20 mm and < 10 mm (81.2 vs. 62.3 and 45.3%). Bilateral lesions are related with CLNM in the univariate analysis, but show insignificance in multivariate analyses after adjustment of confounders. All results have been confirmed in ML algorithms. Our study suggests that males are frequently found to be more susceptible to CLNM, which is supported by findings of previous studies (12,29).
Nevertheless, the aforementioned factors in previous studies are mainly based on preoperative information and are still insufficient to achieve a reliable prediction. Besides, few studies have evaluated the predictive values of intraoperative factors. At our institution, lymph nodes in central compartment are classified as DLN, pretracheal and paratracheal nodes, respectively, and then routinely sent for frozen section examination separately. It was revealed in our previous study that the status of DLN based on frozen section examination was an independent predictor of CLNM and associated with poor prognostic features (14). And our findings of the present study further proves it, showing that 86.3% of DLN-positive patients have CLMN, compared with 43.9% of DLN-negative patients. The DLN status, in particular, is the strongest predictor in nearly all analytical approaches. Thus, we recommend routine intraoperative frozen section examination of DLN not only because the dissection of DLN can be performed safely without additional complications, but more importantly, it is a critical variable predicting further nodal metastases and aids in determining the extent of LN dissection. As intraoperative frozen section examination plays an essential role in immediate assessment of nodal status during an operation (30)(31)(32), it appears to be more promising in accurately predicting risks of LNM in subregion of central compartment when compared with preoperative evaluations alone.
Compared with studies attempting to predict the risk of central compartment lymph node metastases in PTC (12,27,28,33,34), our work has several strengths. First, few studies have ever focused on the subgroup of patients who suffer from clinically low-risk PTC. In fact, we found that a massive number of patients harbor clinically significant lymph node metastases which have not been detected by pre-oprative ultrasound. Furthermore, while ML approaches have shown unparalleled diagnostic performance in differentiating between benign and malignant thyroid nodules in recent reports (35,36), there is, however, little research in the available literature on applying ML algorithms to lymph node metastases in PTC. To the best of our knowledge, this is the very first study to develop a prediction model using ML algorithms for real-time risk evaluation of CLNM with easy-to-use clinical data and fortunately, our model shows a great predictive power, which distinguishes itself from linear models adopted by previous researches. Finally, in order to make this ML-based model easy to use, we established an online application based on it, which is now available for clinicians to facilitate individualized surgical treatment by calculating the risk for each patient: (https://jin63.shinyapps.io/ML_CLNM/). For instance, if a patient is identified to have a high probability of CLNM during surgery, then pCLND may be considered despite contradiction to the current ATA guidelines.
This study, however, also has limitations. First, the nature of a retrospective study might have resulted in selection bias. Second, the ML algorithm model we established, to some extent, was confined to one single institution, which might restrict its generalizability pending further validation in real-world scenarios. Third, predictve value was not high enough because the information in our current clinical database is to a certain degree limited.

CONCLUSIONS
We developed and validated ML algorithms for individualized prediction of CLNM in T1-T2 stage, non-invasive, and clinically node negative PTC patients by utilizing readily available preoperative variables and intraoperative frozen section examination. The ML-based prediction model can accurately identify whether patients are at high-risk of CLNM and its accompanying online risk calculator can serve as an easy-touse tool for clinicians to make precise surgical decisions. In the future, our goal is to further integrate imaging, molecular and genetic data to improve our model performance in the realm of personalized medicine and more studies covering wider populations are also warranted for further validation.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by The First Affiliated Hospital of Chongqing Medical University. The patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

AUTHOR CONTRIBUTIONS
JZhu, ZD, and XS conceived and designed the research. JZhu, JZhe, HR, RH, ZD, and XS prepared the manuscript. JZhu, JZhe, and DW collected and analyzed the data. JZhu, JZhe, and LL performed the statistical analysis and interpreted the results. All authors have read and approved the final manuscript.

FUNDING
This work was supported by the Chongqing Science and Technology Committee (cstc2017shmsA1035).