Machine learning algorithms for identifying contralateral central lymph node metastasis in unilateral cN0 papillary thyroid cancer

Purpose The incidence of thyroid cancer is growing fast and surgery is the most significant treatment of it. For patients with unilateral cN0 papillary thyroid cancer whether to dissect contralateral central lymph node is still under debating. Here, we aim to provide a machine learning based prediction model of contralateral central lymph node metastasis using demographic and clinical data. Methods 2225 patients with unilateral cN0 papillary thyroid cancer from Wuhan Union Hospital were retrospectively studied. Clinical and pathological features were compared between patients with contralateral central lymph node metastasis and without. Six machine learning models were constructed based on these patients and compared using accuracy, sensitivity, specificity, area under the receiver operating characteristic and decision curve analysis. The selected models were then verified using data from Differentiated Thyroid Cancer in China study. All statistical analysis and model construction were performed by R software. Results Male, maximum diameter larger than 1cm, multifocality, ipsilateral central lymph node metastasis and younger than 50 years were independent risk factors of contralateral central lymph node metastasis. Random forest model performed better than others, and were verified in external validation cohort. A web calculator was constructed. Conclusions Gender, maximum diameter, multifocality, ipsilateral central lymph node metastasis and age should be considered for contralateral central lymph node dissection. The web calculator based on random forest model may be helpful in clinical decision.


Introduction
Thyroid cancer has now ranked as the 7th most common cancer globally, with an age-standardized incidence rates of 9.1 per 100 000 people in 2022 (1).Papillary thyroid cancer (PTC) constitutes the majority of thyroid cancer cases and significantly contributes to its growth (2).Surgery, including thyroidectomy and cervical lymph nodes dissection, TSH suppression therapy and radioiodine therapy are the core treatments of PTC.After comprehensive treatment, the majority of patients experience an ideal prognosis, while there are still more than 10% patients suffering relapse (3,4).
Lymph node metastasis has been regarded as a prognostic factor in PTC, serving as a predictor for higher mortality and recurrence rates (5)(6)(7).Currently, the extent of lymph node dissection is mainly decided by the tumor size and clinical lymph node metastasis.However, some lymph nodes metastases are undetected using existing methods, especially the central lymph node metastasis (CLNM) due to their deep location and small size (8).It is reported that the rate of occult lymph node metastasis is around 24% to 82% (9).Occult lymph node metastasis may progress and lead to completion thyroidectomy, potentially impacting the quality of life of patients.
The incidence of contralateral cervical lymph node metastasis (CCLNM) in clinically negative lymph nodes (cN0) PTC patients is reported to range from 3.9% to 30.6%.Male gender, age < 45 years, lymphovascular invasion, extrathyroidal invasion, ipsilateral CLNM, multifocality, tumor size and tumor location are predictors for CCLNM (10)(11)(12).For cN0 patients, the necessity and extent of prophylactic central lymph node dissection (pCLND) are subjects of ongoing debate.Selective pCLND (13) and full extent pCLND (14) were recommended by different studies.Besides, other researchers suggested that ipsilateral CLND should be routinely performed, the decision to perform contralateral CLND should be based on intraoperative frozen-section pathology (15,16), given the higher prevalence of complications such as permanent hypoparathyroidism in patients receiving bilateral CLND compared to those undergoing only ipsilateral CLND (17)(18)(19)(20)(21). Also, there are studies attempted to identify high-risk patients and proposed a "tailored" treatment (18), which means only performing pCLND in high-risk patients but not in others.However, not all factors have the same weight in predicting CCLNM, and a simple "yes or no" judgement is insufficient for clinical decision.Consequently, we attempted to constructed a more precise model to predict CCLNM.
Machine learning (ML), a rapidly evolving field in big data analysis, offers a more sophisticated approach to establish associations between input data and outcomes based on various data types, thereby providing more accurate predictions compared to traditional methods (22).Shortly after its appearance, researchers have been exploring the potential of machine learning to revolutionize medicine (23).Up to now, its application in medical research and practice is extensive.Many studies have explored using classification task to assist in diagnosis and predict prognosis (24, 25).
Here, we developed a machine learning model to predict CCLNM in unilateral cN0 PTC patients, evaluated its performance, and created an online calculator for easy assessment of CCLNM probability.We retrospectively collected age, gender, body mass index (BMI), multifocality, maximum diameter, extrathyroidal evasion (ETE), and number of ipsilateral central lymph node dissection (No. of ICLND) and metastasis (No. of ICLNM) data of the enrolled patients.Age at diagnosis was divided into two groups: younger than 55 years and 55 years or older.According to WHO-BMI criteria, patients were divided into normal weight (18.0<=BMI <25.0), underweight (BMI <18.0), and overweight (BMI >=25.0).Maximum diameter was divided into <=1cm, >1 and <=4cm, and >4cm three groups.Ratio of ICLNM (RICLNM) was calculated as follows:

Development and comparison of MLbased models
Based on the presence or absence of CCLNM, patients were divided into two groups and their baseline information was compared.To further analyse the risk factors of CCLNM, we performed univariate and multivariate logistic regression analysis.
For construction and validation of ML models, we randomly split the data from WHUH into training cohort (80%) and validation cohort (20%).Six popular classification ML models were developed using all of the seven features, namely K-nearest neighbor (KNN), decision tree (DT), support vector machines (SVM), extreme gradient boosting (XGBoost), logistic regression (LR), and random forest (RF).
Multidimensional evaluation was used to evaluate the performance of the models, including accuracy, area under the receiver operating characteristic (AUC), sensitivity, specificity, false positive rate, and false negative rate.For accuracy, AUC, sensitivity and specificity, the closer to 1 they were, the better the model performed.While for false positive rate and false negative rate, the closer to 0 the better.To assess the clinical benefits of the models, decision curve analysis (DCA) was conducted.DCA is a method to demonstrate the net benefit, that is benefit to a true patient subtracts harm to a non-patient, of the treatment at certain threshold probability (27).
To interpret the models, we used feature importance to evaluate the contribution of variables to the models, and it was obtained by the increment of the prediction error of the model after rearranging according to the features.

Validation of the models
After choosing the best performed model, we validated it on internal validation cohort first and then external cohort.The assessment indicators were the same with those using for model comparisons.Confusion matrix was used to show the difference of the true situation and the predictive situation.Calibration curve was used to evaluate the agreement of truth and prediction.

Statistical analysis and web construction
All statistical analysis was performed by R software (The R Foundation for Statistical Computing).Chi-square test and Student's t-test were used for categorical data and continuous data, respectively.Univariate and multivariate logistic regression analysis were performed to calculate the odds ratios (ORs) with 95% confidence intervals (CIs).P value <0.05 was considered to be statistically significant.R package 'shiny' was used for a web calculator construction.

Demographic and clinicopathological characteristics
There were 10816 patients underwent thyroid surgery from 2009 to 2020 in WHUH.A total of 2225 patients were included in the retrospective study (Figure 1).2A, Supplementary Table 1).
Correlation analysis (Figure 2B) showed that maximum diameter and RICLNM have strong correlation (>0.03), while there is not significant correlation between any other two features.Considering their clinical significance, we included all factors above into ML models.

Performance of machine learning algorithms
Using age, gender, BMI, maximum diameter, multifocality, ETE, and ICLNM, predictive models for CCLNM were developed based on 6 algorithms, namely KNN, DT, SVM, XGBoost, LR, and RF.To compare the predictive value of ICLNM and RICLNM, we built another 6 models using age, gender, BMI, maximum diameter, multifocality, ETE, and RICLNM.Supplementary Table 2 detailed the 12 models.Comparisons of their performance on the training cohort were demonstrated in Figure 3A and ROC in Figure 3B.All of the models had excellent accuracy, AUC and specificity, with LR having the highest accuracy (0.801) and AUC (0.786).When using RICLNM to build the models, better accuracy, AUC and sensitivity were achieved, with slightly lower specificity.DCA was performed to evaluate the clinical utility of these models (Figure 3C).RF and KNN showed obvious higher net benefits than others.For better understanding the models, relative importance of the features was shown in Figures 3D, E. It is interesting that RICLNM showed great importance in all 6 models built with it, while in models built with ICLNM it is not always the most important one.
Due to its great performance on accuracy, AUC and DCA, RF using RICLNM was chosen as a potential model for predicting CCLNM.

Predictive performance of RF on internal validation cohort
The validation cohort was used to prove the predictive performance of RF model and it is similar to the training cohort (accuracy 0.807, AUC 0.793, sensitivity 0.355, specificity 0.955).The confusion matrix and ROC were shown in Figures 4A, B, respectively.The calibration curve (Figure 4C) demonstrated good agreement between prediction and observation.2 summarized the demographic and clinicopathological features of the 409 selected patients from DTCC cohort.The CCLNM rate is higher than data from our center (34.5% V.S. 21.1%) and patients of DTCC cohort have larger maximum diameter (15.21mmV.S. 9.10mm), higher ETE rate (21.52%V.S. 3.60%), and higher ratio of ICLNM rate (0.39 V.S. 0.21).

Predictive performance of RF on external validation cohort
Accuracy, AUC, sensitivity and specificity of RF on DTCC cohort are 0.731, 0.755, 0.532 and 0.836, respectively.Figures 4D-F demonstrated the confusion matrix, ROC and calibration curve of RF on DTCC cohort.

Web calculator
For conveniently calculating the CCLNM probability in clinical practice, we established an online calculator based on RF model (https://cclnm.shinyapps.io/CCLNMAPP/).Clinicians can predict the CCLNM risk by simply inputting 7 variables (Figure 6).

Discussion
In this study, we developed and compared 6 popular machine learning algorithms-based models to predict CCLNM in cN0 unilateral PTC patients using multicenter clinical data, utilizing demographic and clinicopathologic features.RF algorithm was selected for further validation and web calculator construction due to its outstanding performance in terms of accuracy, AUC and DCA.Both internal and external validation of the model were performed and showed satisfying results, indicating its potential for widespread application.A web calculator was constructed to facilitate the estimation of the probability of CCLNM in cN0 unilateral PTC patients.
This model helps in identifying CCLNM patients using demographic and clinicopathologic features which are easy to obtained before and during operation.Our study had large population (1780 patients in training cohort, 445 in internal validation cohort, and 409 in external validation cohort), which provides more precise prediction.Besides, to reduce selection bias, our study conducted the validation in external cohort using multicenter clinical data from nine different hospitals.With promising performance, our model showed great robustness and extensive application in CCLNM prediction.Most studies only identified some risk factors of CCLNM in cN0 PTC patients, which were not applicable enough in clinical decision.We not only analyzed the predictive value of several factors, but also constructed a prediction model, and the utility of machine learning makes the prediction more accurate.
The prevalence of CCLNM in our study was 21.1%, which was consistent with previous studies (8.13%-34.3%)(11,28,29).The discrepancy may be attributed to variations in patient populations and surgical criteria.We identified that <55 years, male, tumor size > 1cm, multifocality, and ICLNM were risk factors of CCLNM in cN0 PTC patients, while ETE showed no significant prediction value, in line with previous studies (10).We investigated several demographic factors including age, gender and BMI in this work.It is widely accepted that older age is associated with poorer prognosis in thyroid cancer patients (30), but in our study and previous studies younger age were identified as a predictor of CCLNM (10).Furthermore, studies for total CLNM yielded similar results (31).Although PTC is more prevalent in women than men, male has been reported to be a risk factor of poorer prognosis and worse response to treatment.The underlying molecular mechanisms remains unclear but estrogen and androgen may play a role (32).Even though obesity is reported to increase thyroid cancer incidence (33), its role in lymph node metastasis is contradictory (34,35) and differs depending on the regions (36).The relationship between obesity and lymph node metastasis and the mechanism underlying requires further study.Some of the clinicopathologic features showed effects in CCLNM prediction in this work.Our results indicated that in cN0 unilateral PTC patients, multifocality was important to predict CCLNM.Multifocality relates to advanced disease and indicates to higher rate of recurrence, and thus patients with more than one lesions should receive more aggressive treatment (37).2015 ATA guideline took tumor size into consideration when determining whether to perform pCLND (2).As many other studies (11,29,(38)(39)(40)(41) and thyroid cancer prognosis is controversial and recent studies further divided it into minimal ETE and extensive ETE based on the extent involved, which showed differences in clinicopathological features, like lymph node metastasis and prognosis (42, 43).As for CCLNM, previous study also demonstrated contradictory results (29,38,40).Our analysis revealed that ETE had significant value when performing univariate logistic regression, but not in multivariate analysis, indicating it was not an independent risk factor.However, we did not classify ETE into minimal and extensive group, which may lead to different outcomes.
Similar to previous studies (28,29,(38)(39)(40), ICLNM exhibited great predictive value for CCLNM.Despite of the presence of "skip" metastasis, most metastasis occurs ipsilaterally first.It is interesting that when the models were developed using RICLNM instead of   There are some limitations of our study.First, its retrospective nature introduced unavoidable bias.Prospective studies are required to verify the accuracy and clinical benefit of the model.Second, although we externally validated of the model using DTCC cohort which includes data from nine centers, patients were all from China.Its application in other races needs further validation.Third, all histopathological features, including maximum diameter, multifocality, ETE and ICLNM, in the study were postoperative results, which are difficult to acquire preoperatively with current detection methods, but a rapid frozen pathological examination intraoperatively can offer the characteristics in need.Fourthly, since the clinical classification is operator-dependent, the judgement of cN0 is not absolutely objective and consistent.More accurate imaging methods may solve this problem in the future.
In conclusion, we presented a ML-based model to predict CCLNM probability in cN0 unilateral PTC patients, validated it in internal and external cohort, and developed an easy-to-use web calculator based on it.

Figure 5
Figure5showed the selection of external validation cohort.Table2summarized the demographic and clinicopathological features of the 409 selected patients from DTCC cohort.The CCLNM rate is higher than data from our center (34.5% V.S. 21.1%) and patients of DTCC cohort have larger maximum diameter (15.21mmV.S. 9.10mm), higher ETE rate (21.52%V.S. 3.60%), and higher ratio of ICLNM rate (0.39 V.S. 0.21).Accuracy, AUC, sensitivity and specificity of RF on DTCC cohort are 0.731, 0.755, 0.532 and 0.836, respectively.Figures4D-Fdemonstrated the confusion matrix, ROC and calibration curve of RF on DTCC cohort.

2
FIGURE 2 Feature selection.(A) Forest plot of the univariate and multivariate analysis of factors in predicting CCLNM.(B) Correlation analysis of each two factors.CCLNM, contralateral center lymph node metastasis; BMI, Body mass index; ETE, extrathyroidal evasion; ICLNM, ipsilateral central lymph node metastasis; RICLNM, ratio of ICLNM.

4
FIGURE 4 Performance of RF built with RICLNM on internal and external validation cohort.Confusion matrix (A), ROC (B) and calibration curve (C) of internal validation cohort.Confusion matrix (D), ROC (E) and calibration curve (F) of external validation cohort.CCLNM, contralateral center lymph node metastasis.

FIGURE 5 Flowchart
FIGURE 5Flowchart of patient selection of external validation cohort.PTC, Papillary thyroid carcinoma; US, Ultrasound.

TABLE 1
Demographic and clinicopathologic features of the WHUH patients grouped by CCLNM.
(44)(45)(46) sensitivity and performance in DCA increased dramatically, with only slight decrease of specificity, and in those models RICLNM had the strongest importance among all variables.Most previous study only focused on the existence of ICLNM, while only Zhou and Qin(11)included the amount of ICLNM in their analysis.Due to the different extent of dissection, absolute number of lymph nodes with metastasis may not reflect the true situation.Studies have demonstrated the significance of metastatic lymph node ratio (MLNR) in PTC prognosis(44)(45)(46)but it is rarely used in predicting CCLNM up to now.Our study showed that RICLNM was a stronger predictor of CCLNM than the existence of ICLNM.However, when only very few suspicious lymph nodes are dissected, the ratio will be either extremely large or small.The relationship between the extent of lymph node dissection and clinical value of MLNR requires further study.

TABLE 2
Demographic and clinicopathologic features of the DTCC patients grouped by CCLNM.CCLNM, contralateral center lymph node metastasis; b BMI, Body mass index; c ETE, extrathyroidal evasion; d ICLNM, ipsilateral central lymph node metastasis; e RICLNM, ratio of ICLNM. a