Computed Tomography-Based Radiomics Model to Predict Central Cervical Lymph Node Metastases in Papillary Thyroid Carcinoma: A Multicenter Study

Objectives This study aimed to develop a computed tomography (CT)-based radiomics model to predict central lymph node metastases (CLNM) preoperatively in patients with papillary thyroid carcinoma (PTC). Methods In this retrospective study, 678 patients with PTC were enrolled from Yantai Yuhuangding Hot3spital (n=605) and the Affiliated Hospital of Binzhou Medical University (n=73) within August 2010 to December 2020. The patients were randomly divided into a training set (n=423), an internal test set (n=182), and an external test set (n=73). Radiomics features of each patient were extracted from preoperative plain scan and contrast-enhanced CT images (arterial and venous phases). One-way analysis of variance (ANOVA) and least absolute shrinkage and selection operator algorithm were used for feature selection. The K-nearest neighbor, logistics regression, decision tree, linear-support vector machine (linear-SVM), Gaussian-SVM, and polynomial-SVM algorithms were used to establish radiomics models for CLNM prediction. The clinical risk factors were selected by ANOVA and multivariate logistic regression. Incorporated with clinical risk factors, a combined radiomics model was established for the preoperative prediction of CLNM in patients with PTCs. The performance of the combined radiomics model was evaluated using the receiver operating characteristic (ROC) and calibration curves in the training and test sets. The clinical usefulness was evaluated through decision curve analysis (DCA). Results A total of 4227 radiomic features were extracted from the CT images of each patient, and 14 non-zero coefficient features associated with CLNM were selected. Four clinical variables (sex, age, tumor diameter, and CT-reported lymph node status) were significantly associated with CLNM. Linear-SVM led to the best prediction model, which incorporated radiomic features and clinical risk factors. Areas under the ROC curves of 0.747 (95% confidence interval [CI] 0.706–0.782), 0.710 (95% CI 0.634–0.786), and 0.764 (95% CI 0.654–0.875) were obtained in the training, internal, and external test sets, respectively. The linear-SVM algorithm also showed better sensitivity (0.702 [95% CI 0.600–0.790] vs. 0.477 [95% CI 0.409–0.545]) and accuracy (0.670 [95% CI 0.600–0.738] vs. 0.642 [95% CI 0.569–0.712]) than an experienced radiologist in the internal test set in the combined radiomics model. The calibration plot reflected a favorable agreement between the actual and estimated probabilities of CLNM. The DCA indicated the clinical usefulness of the combined radiomics model. Conclusion The combined radiomics model is a non-invasive preoperative tool that incorporates radiomic features and clinical risk factors to predict CLNM in patients with PTC.

Objectives: This study aimed to develop a computed tomography (CT)-based radiomics model to predict central lymph node metastases (CLNM) preoperatively in patients with papillary thyroid carcinoma (PTC).
Methods: In this retrospective study, 678 patients with PTC were enrolled from Yantai Yuhuangding Hot3spital (n=605) and the Affiliated Hospital of Binzhou Medical University (n=73) within August 2010 to December 2020. The patients were randomly divided into a training set (n=423), an internal test set (n=182), and an external test set (n=73). Radiomics features of each patient were extracted from preoperative plain scan and contrast-enhanced CT images (arterial and venous phases). One-way analysis of variance (ANOVA) and least absolute shrinkage and selection operator algorithm were used for feature selection. The K-nearest neighbor, logistics regression, decision tree, linearsupport vector machine (linear-SVM), Gaussian-SVM, and polynomial-SVM algorithms were used to establish radiomics models for CLNM prediction. The clinical risk factors were selected by ANOVA and multivariate logistic regression. Incorporated with clinical risk factors, a combined radiomics model was established for the preoperative prediction of CLNM in patients with PTCs. The performance of the combined radiomics model was evaluated using the receiver operating characteristic (ROC) and calibration curves in the training and test sets. The clinical usefulness was evaluated through decision curve analysis (DCA).
Results: A total of 4227 radiomic features were extracted from the CT images of each patient, and 14 non-zero coefficient features associated with CLNM were selected. Four clinical variables (sex, age, tumor diameter, and CT-reported lymph node status) were significantly associated with CLNM. Linear-SVM led to the best prediction model, which incorporated radiomic features and clinical risk factors. Areas under the ROC curves of INTRODUCTION Papillary thyroid carcinoma (PTC) is the predominant pathological type of thyroid malignancy (1), and its incidence has rapidly increased in the past decade (2,3). Although PTC is regarded as an indolent tumor, a portion of cancer cells metastasize to lymph nodes (LNs) around the thyroid gland, including central cervical lymph node metastases (CLNM) and lateral CLNM (4). CLNM has been reported in up to 60%-70% of patients and is an important risk factor for locoregional recurrence (5)(6)(7)(8)(9). CLNM is an independent risk factor for the prognosis of patients with PTC (10). Tumor cells usually occur first in the central neck (level VI) and then in the lateral neck (levels II, III, and IV) (11). LN metastases rarely occur first in the lateral neck compartments (12,13). Therefore, central cervical compartment has the highest risk of metastases (14). Central cervical LNs are the "sentinel" LNs of PTC neck LNs. Improving the diagnostic level of central cervical LNs will help reduce unnecessary LN dissection. However, how to identify patients with CLNM accurately remains a problem because of the lack of discriminate features. Preoperative diagnosis of CLNM plays an important role in the formulation of surgical plans (15)(16)(17). Thus, the accurate preoperative prediction of CLNM provides a reference for individualized treatment decision.
Currently, ultrasound (US) and contrast-enhanced computed tomography (CT) plays an important role in preoperative nodal staging and is a standard procedure in clinical practice (18,19). However, preoperative US can only detect 20%-31% of CLNM (20,21) and may only change the surgical procedure of 20% patients (22). Its efficacy in identifying malignant nodes is unsatisfactory (23). By contrast, CT has a variable sensitivity of 35%-77% and specificity of 70%-96% for the diagnosis of CLNM in patients with thyroid cancer (24,25). Meanwhile, the diagnostic efficacy of CT is still limited by the subjective judgment of different radiologists, which may lead to a considerable proportion of patients being under-staged or over-staged.
Radiomics have recently been introduced into medical imaging for preoperative diagnosis as computer-aided systems. It refers to the high-throughput extraction of large numbers of imaging features, thus converting medical images into mineable highdimensional data; the subsequent quantitative analysis of these data can be used to support surgical decision making (26,27). Radiomics-based methods were also proposed for the prediction of LN metastases in patients with PTC by converting radiological images into minable high-dimensional data (28)(29)(30). To our knowledge, some radiomics-based studies have focused on the preoperative prediction of CLNM in patients with PTC to date, but most of these studies are based on US images to develop radiomics models (31,32) and are single centered. Studies have yet to develop a radiomics-based model with CT images among multicenter to predict CLNM in patients with PTC.
Hence, the study aimed to develop a CT-based radiomics model that would incorporate radiomics features and clinical risk factors for the preoperative prediction of CLNM in patients with PTC and being tested in different centers. This comprehensive model may assist clinicians in selecting the most appropriate treatment strategies to achieve a better outcome.

Patients
This retrospective study was approved by the Ethics Committees of Yantai Yuhuangding Hospital and the Affiliated Hospital of Binzhou Medical University, exempted from informed consent in accordance with the Helsinki declaration. This analysis was performed on patients who underwent surgical treatment and were pathologically diagnosed with PTC in Yantai Yuhuangding Hospital (denote as Hospital 1) and the Affiliated Hospital of Binzhou Medical University (denote as Hospital 2) from August 2010 to December 2020. The inclusion criteria were as follows: (a) single lesion; (b) histologically confirmed with PTC; (c) underwent no preoperative anticancer treatment; (d) contrastenhanced CT was performed within 2 weeks before surgery; and (e) patients underwent neck dissection with ipsilateral lobectomy or total thyroidectomy and received pathological diagnosis of LNs. The exclusion criteria were as follows: (a) postoperative pathological examination showed concomitant non-PTC components in the lesion (such as atypical hyperplasia, follicular tumors, medullary carcinomas, undifferentiated carcinomas, and metastatic carcinomas); (b) biopsy before CT scan; (c) without preoperative thyroid CT scan; (d) the patient was suffering from other concomitant malignancies, such as lymphoma, breast cancer, or liver cancer; (e) the primary tumor was unclear on CT images; (f) the primary tumor had maximum diameter < 0.5 cm; (g) the primary tumor was difficult to segment because of nodular goiter or chronic lymphocytic thyroiditis; and (h) postoperative pathological examination showed multifocal PTC.
A total of 678 patients from two hospitals were included in this study. Patients in Hospital 1 (N=605) were divided into a training set (N=423) and an internal test set (N=182) at a ratio of 7:3 by using a completely random classification method. Patients in Hospital 2 were included as an external test set (N=73). Clinical data of patients were collected, including age, sex, CT primary tumor maximum diameter, location, and CT-reported LN status.
Preoperative CT findings on LN status were used as the CTreported LN status. In accordance with the National Comprehensive Cancer Network (version3.2020) Guidelines and some previous studies (19,23,33,34), metastatic lymph nodes (LN+) were considered when at least one of the following criteria presented: (a) LN ≥ 10 mm in the maximal short axis diameter; (b) round or irregular shape; (c) rough margin, fuzzy boundary, and/or invasion into adjacent tissues; (d) calcification or cystic and/or necrotic change; (e) strong enhancement (similar to or stronger than that of the pharyngeal mucosa); and (f) heterogeneous enhancement. On the basis of previous studies (35,36), suspicious LNs (LN suspicious ) were considered when the LNs did not meet the above criteria but were ≥5 mm in the maximal short axial diameter at cervical level VI. CTreported LN status was assessed by a radiologist (Dr. A) with 12 years of experience in thyroid radiology, and the results were verified by another radiologist (Dr. B) who had 22 years of subspecialty experience in thyroid radiology. The weighted kappa coefficient was used to evaluate the agreement between these two radiologists. For assessment of overall agreement, the mean k value was calculated from these pairs. Strength of k agreement was defined as follows: < 0.00, poor; 0.00-0.20, slight; 0.21-0.40, fair; 0.41-0.60, moderate; 0.61-0.80, substantial; and 0.81-1.00, almost perfect (37). All disagreements were resolved through consultation.

Surgery Strategy
Patients received lobectomy and isthmectomy or total thyroidectomy depending on the clinical TNM stage. For patients with lateral cervical LN metastases, lateral cervical LN dissection was performed. The resected thyroid tissues were processed for pathological examination (including the determination of unifocal or multifocal PTC). The resected LNs were also subjected to pathological examination, and LN metastases were determined.

CT Images Acquisition
Before receiving surgery, all patients underwent contrastenhanced CT on thyroid with a 16-slice spiral CT scanner (Sensation 16; SIEMENS) or a 64-slice spiral CT scanner (Sensation 64; SIEMENS). After plain CT scanning, a dynamic contrast-enhanced CT scan was performed after intravenous administration of 80-100 mL nonionic contrast material (Iopamidol, 370 mg I/mL, Bracco, Milan, Italy) by using power injection at a rate of 3.5 mL/s followed by saline flush (20 mL). Arterial and venous phase images were obtained at 25 and 60 s, respectively. The slice thickness of the reconstructed image was 1.0 mm. Arterial phase, venous phase, and plain scan CT images were retrieved for image feature extraction. All lesion image parameters (primary tumor maximum diameter, location) were examined and assessed by radiologist Dr. A, and the results were verified by another radiologist Dr. B. The CT images of all target lesions were exported from the Picture Archiving and Communicating System with Digital Imaging and Communications in Medicine format and imported into Radcloud (Huiying Medical Technology Co., Ltd.).

CT Images Preprocessing and Segmentation
Al CT images were imported into Radcloud for image preprocessing, including image normalization, to reduce the difference in grayscale texture between images. The data were resampled to 1 mm*1 mm*1 mm to eliminate the difference in different image scales. Before feature selection, tumor segmentation was performed by manually delineating the volume of interest (VOI) along the tumor contour on each CT image slice using Radcloud. Radiologist Dr. A (12 years of experience) used Radcloud to manually delineate the VOI of the primary thyroid tumor as consistent as possible with the margins of the primary tumor lesion. This study did not delineate the LN region but the primary thyroid tumor because many metastatic LNs are extremely small and difficult to identify on the CT image. In addition, many LNs were unable to correspond to each other between CT image and pathology. For these reasons, this prediction model was based on the primary thyroid tumor. The primary tumor VOI region was evaluated by radiologist Dr. B (22 years of subspecialty experience) to ensure the accuracy of segmentation. Following the same guideline describing how to define the boundary of tumors, both doctors were aware of the diagnosis of thyroid cancer but were blinded to the clinical and histopathologic data. The schematic of image segmentation is shown in Figure 1.

Feature Extraction and Selection
Radiomic features (shape, first-order, texture features) were extracted from each VOI by Radcloud. All feature extraction methods were implemented in Radcloud. These features were divided into three groups, namely, first-order statistics, shape features, and texture features. Intra class correlation coefficients (ICCs) were utilized for evaluating the intra-and inter-observer agreement in terms of feature extraction. First, Dr. A and Dr. B randomly selected the images of 30 patients to evaluate the interclass reproducibility. After a month, Dr. A repeated the same procedure. The remaining VOI segmentation was performed by Dr. A. Features with ICCs greater than 0.75 indicated satisfactory reproducibility and were selected for radiomic features, and those with ICCs less than 0.75 were eliminated. After one-way analysis of variance (ANOVA), features with p <0.05 were selected. Then, the Least Absolute Shrinkage and Selection Operator (LASSO) algorithm was used to select non-zero coefficients. The optimal penalization coefficient alpha was set by five-fold cross-validation, and radiomic features with nonzero coefficients within the training set were finally selected. To avoid overfitting, we ensured the feature selection, and the subsequent classifications were performed only on the training set rather than on the entire dataset.

Radiomics Model Construction
Basing the radiomic features that were selected from the training set, this study used six algorithms to construct the radiomics model for CLNM prediction: K-nearest neighbor (KNN), logistic regression (LR), decision tree (DT), linear-support vector machine (linear-SVM), Gaussian-SVM, and polynomial-SVM. Linear-SVM, Gaussian-SVM, and polynomial-SVM are SVM algorithms with different kernels, which may cause different results.

Clinical Independent Risk Factor Selection
The significant clinical risk factors were determined using ANOVA. Then, the significant clinical risk factors were integrated into the multivariate LR analysis of backwardstepwise selection. During this procedure, collinearity was considered and the factors with variance inflation factor (VIF) > 10 and p value > 0.05 were eliminated. Akaike information was used as the criterion when the smallest Akaike information criterion was reached, the stepwise procedure was stopped, and the final multivariate LR selected the independent clinical factors.

Construction of the Combined Radiomics Model
The six algorithms (KNN, LR, DT, linear-SVM, Gaussian-SVM, and polynomial-SVM) were used to construct combined radiomics models, which cooperate selected radiomics features and independent clinical risk factors, and the best combined model was selected.

Assessment and Test of Combined Radiomics Model Performance
The prediction performance of the combined radiomics models in the training and test sets was evaluated using the area under the receiver operator characteristic (ROC) curve (AUC). The Youden index method was used to identify the optimal cut-off value in ROC curves. Sensitivity, specificity, and accuracy values were determined by the cut-off value. Calibration of the model was assessed using Platt method (38). The calibration curve of the prediction model is an important index to evaluate the probability accuracy of a disease risk model to predict an individual outcome event in the future. A good degree of calibration indicates a high accuracy of the prediction model, whereas a poor degree of calibration suggests that the model may overestimate or underestimate the risk of disease. Good agreement between the true state of CLNM and the predicted probability based on the radiomics model was achieved when the calibration curves were close to the diagonal line. Decision curve analysis (DCA) was used to evaluate the impact of the radiomics model on the patients' benefit in predicting the CLNM of PTC to assist clinicians in making treatment decisions. Probability thresholds are important for the trade-off between accepting necessary clinical procedures and avoiding unnecessary ones. The curve with the highest gain is the best prediction method.

Statistical Analysis
All statistical tests were conducted in Python 3.6, R software 4.0.3, and SPSS 26. Scikit-Learn, a python library, was performed for the radiomic feature selection and model construction. The modules of "feature-selection", "linear-model", "svm", "neighbors", "tree", and "metrics" were used in feature selection and model construction. In R software, the "rms"

Demographics Features
The flow chart of patient screening is shown in Figure 2.

Feature Selection and Radiomics Model Construction
A total of 4227 radiomic features were extracted from the arterial phase, venous phase, and plain scan CT images of each patient. The ICCs ranged from 0.864 to 0.989 in the intra-observer and from 0.832 to 0.957 in the inter-observer. This result indicated that feature extraction within and between observers had good repeatability. Finally, 14 non-zero coefficient features ( Table 2) associated with CLNM prediction were selected after ANOVA and LASSO (Figures 3A, B) Figure 6A), 0.709 (95% CI 0.634-0.786) in the internal test set ( Figure 6B), and 0.764 (95%CI 0.654-0.875) in the external test set ( Figure 6C). In the calibration curve ( Figure 6D), the gray line represents perfect prediction, and the dotted line represents the calibration curve of the combined radiomics model. The calibration curve showed good agreement between the true state of central LN metastases and the predicted probability based on radiomics. The results of DCA are shown in Figure 7. When the threshold probability ranged from 0.2 to 0.7 in the internal test set and 0.1 to 0.9 in the external test set, the combined radiomics model to predict CLNM provides more net benefit than the "treat all" or "treat none" scheme. Therefore, this combined radiomics model constructed by linear-SVM showed excellent performance in discrimination, calibration, and clinical usefulness.

DISCUSSION
Due to there's still controversial in surgery strategies for thyroid carcinoma, preoperative prediction of thyroid cancer and LN status is of vital importance for the precise treatment of thyroid     carcinoma (39). The judgement of LN status is the core factor for routine central LN dissection (40)(41)(42)(43). Thus, noninvasive LN assessment tool is warranted. In this multicenter retrospective study, a CT-based comprehensive radiomics model including clinical risk factors and radiomic features was developed for the pretreatment prediction of CLNM in patients with PTC.
The linear-SVM model demonstrated the highest performance of predicting CLNM, with an AUC of 0.764 in the external test set. Thus, the linear-SVM algorithm was chosen as the best and compared with experienced radiologist's performance. The application of the combined radiomics model provides a new approach for establishing prediction models with multiple  characteristics. This individualized model has satisfied the requirements of precision medicine development and can be used by doctors and patients easily.
Many studies have been conducted on this topic (44)(45)(46)(47)(48). Zhu et al. (46) applied six machine learning algorithms coupled with preoperative clinical characteristics and intraoperative information to develop prediction models for CLNM. Yang et al. (47) developed a multicenter nomogram with internal and external validation set C-index values of 0.854 and 0.825, respectively. However, their internal validation set was through a bootstrapping analysis, which might cause the overfitting of the model. In addition, these studies solely used clinical information to construct predictive models. By contrast, our multicenter research included an independent internal test set and an external test set and provided additional radiomics evidence in predicting the CLNM of PTC. Compared with US signature-based nomograms (45,49,50), our CT-based combined radiomics model, given the high clarity of CT images in identifying LN metastases, incorporated the CTreported LN status, clinical risk factors, and radiomics features, which could be obtained through a non-invasive method before surgery. This combined radiomics model showed acceptable performance in predicting LN metastases.
This study offers other notable advantages. First, six combined radiomics models were constructed by machine , and external test set (C). The y-axis measures the net benefit. The x-axis shows the corresponding risk threshold. The grey line represents the assumption that all lesions were central lymph node metastases. The black straight line represents the assumption that all nodules were non-CLNM. If the threshold probability was more than 30%, using the combined model (black line) to predict CLNM added more benefit than the Rad-score model (red line) and clinical model (blue line) in the internal test set; If the threshold probability was more than 10%, using the combined model (black line) to predict CLNM added more benefit than the Rad-score model (red line) and clinical model (blue line) in the external test set.
learning algorithms (KNN, LR, DT, linear-SVM, Gaussian-SVM, and polynomial-SVM). The reliability and accuracy of the models can be better demonstrated by comparing the performance of these models. Linear-SVM showed the highest value among these six algorithms. The linear-SVM classifier is implemented specifically for massive levels of data and features (51). Relative to the KNN, LR, Gaussian-SVM, polynomial-SVM, and DT, linear-SVM has faster training and classification speed because it is well suited for highdimensional features (52). In the present study, linear-SVM showed the highest classification accuracy (0.709), which is higher than the recognition accuracy of the KNN, LR, Gaussian-SVM, polynomial-SVM, and DT algorithms. Hence, the linear-SVM has prominent advantages over the other five algorithms in this study. Second, some preprocessing techniques were applied prior to feature extraction to improve feature discrimination. Third, this study is a multicenter study, and CT images were extracted from different data centers and different machines, which improved the generalization ability of the combined radiomics model. This study has several limitations. First, our sample size was modest and only carried out in China, especially for the testing set, which may limit the performance of our predictive modelling. A large-scale, cross-race study is needed to improve the prediction efficiency of the model. Second, although the manual segmentation method has achieved satisfactory interand intra-observer reproducibility of radiomics feature extraction, over 90% of features have good reproducibility (53). However, the automated method of image segmentation may have high stability. Third, tumor diameters < 0.5 cm were not included because they could not be reliably identified and segmented on CT images. In the future, we will use smaller slice gaps and more advanced CT software to improve the detection of smaller tumors. Fourth, this study only analyzed single lesion tumor in patients with PTC instead of multiple tumors to avoid selection bias. Thus, the generalization ability of the model is limited. Fifth, the radiomic features were extracted from the primary tumor instead of LNs, which may affect the accuracy of the model. Moreover, only the conventional machine learning radiomics method was used in this study. Some studies have shown that deep learning methods have a certain value in predicting LN metastases. We will continue our research with deep learning methods in the future.
In summary, our combined radiomics model is a noninvasive predictive tool that combines the radiomic features of CT images with independent clinical risk factors and shows good application prospects. A large sample size, cross-race study should be carried out to optimize the model and improve the efficiency of the model in subsequent studies.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding authors.

ETHICS STATEMENT
Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

AUTHOR CONTRIBUTIONS
JL implemented the conceptualization, methodology, data curation, and writing original draft preparation. XW implemented literature searching, and manuscript writing. NM Identified the radiological characteristics of PTC and estimated and adjusted the accuracy of VOIs. GZ implemented VOI segmentation, and literature searching. HZ implemented data analysis and figures making. YM and CJ implemented visualization, and investigation. JM implemented writing-reviewing and editing. XS conducted the design, quality control, and data interpretation. All authors contributed to the article and approved the submitted version.

FUNDING
This work was supported by the "Taishan Scholar" Project (No. ts20190991).