Prediction of cervical lymph node metastasis in solitary papillary thyroid carcinoma based on ultrasound radiomics analysis

Objective To assess the utility of predictive models using ultrasound radiomic features to predict cervical lymph node metastasis (CLNM) in solitary papillary thyroid carcinoma (PTC) patients. Methods A total of 570 PTC patients were included (456 patients in the training set and 114 in the testing set). Pyradiomics was employed to extract radiomic features from preoperative ultrasound images. After dimensionality reduction and meticulous selection, we developed radiomics models using various machine learning algorithms. Univariate and multivariate logistic regressions were conducted to identify independent risk factors for CLNM. We established clinical models using these risk factors. Finally, we integrated radiomic and clinical models to create a combined nomogram. We plotted ROC curves to assess diagnostic performance and used calibration curves to evaluate alignment between predicted and observed probabilities. Results A total of 1561 radiomics features were extracted from preoperative ultrasound images. After dimensionality reduction and feature selection, 16 radiomics features were identified. Among radiomics models, the logistic regression (LR) model exhibited higher predictive efficiency. Univariate and multivariate logistic regression results revealed that patient age, tumor size, gender, suspicious cervical lymph node metastasis, and capsule contact were independent predictors of CLNM (all P < 0.05). By constructing a clinical model, the LR model demonstrated favorable diagnostic performance. The combined model showed superior diagnostic efficacy, with an AUC of 0.758 (95% CI: 0.712-0.803) in the training set and 0.759 (95% CI: 0.669-0.849) in the testing set. In the training dataset, the AUC value of the nomogram was higher than that of the clinical and radiomics models (P = 0.027 and 0.002, respectively). In the testing dataset, the AUC value of the nomogram model was also greater than that of the radiomics models (P = 0.012). However, there was no significant statistical difference between the nomogram and the clinical model (P = 0.928). The calibration curve indicated a good fit of the combined model. Conclusion Ultrasound radiomics technology offers a quantitative and objective method for predicting CLNM in PTC patients. Nonetheless, the clinical indicators persists as irreplaceable.


Introduction
Thyroid cancer stands as the most prevalent malignancy within the endocrine system.Among its histological variations, papillary thyroid carcinoma (PTC) takes precedence, accounting for approximately 90% of reported thyroid cancer cases.The prognosis for most PTC patients is quite favorable, with an impressive 10-year survival rate reaching up to 98%.However, in PTC patients, cervical lymph node metastasis (CLNM) is common, and it closely correlates with postoperative disease recurrence and survival prognosis (1,2).Accurately predicting CLNM in preoperatively holds significant clinical value as it provides crucial guidance for selecting appropriate clinical treatment strategies.
The utilization of preoperative CT examination has been a common approach for assessing the presence of CLNM in PTC patients (3).However, its diagnostic sensitivity remains limited, especially in detecting subclinical lymph node metastasis, with a sensitivity of only 60%.Additionally, it is associated with relatively modest specificity and raises concerns about radiation exposure (4,5).The primary benefit of preoperative CT lies in its ability to provide detailed insights into the dimensions, location, and characteristics of the primary thyroid tumor.Notably, it also facilitates the assessment of extrathyroidal extension (ETE), aiding in determining the extent of surgical resection and guiding appropriate surgical interventions (6).Currently, ultrasound examination serves as the primary method for preoperatively diagnosing CLNM in PTC patients.Conventional ultrasound (CUS) examination can determine the presence of CLNM by systematically scanning the cervical lymph nodes according to anatomical regions, and assessing changes in echogenicity, internal components, calcification, and Color Doppler flow imaging (CDFI) patterns of the cervical lymph nodes (7).However, it's important to acknowledge that the CUS exhibits relatively lower sensitivity in diagnosing central compartment CLNM, with its diagnostic effectiveness primarily focused on detecting lateral CLNM (8,9).Previous studies have explored the feasibility of predicting CLNM based on ultrasound characteristics and clinical features of the tumor in PTC patients.Nevertheless, predictive models constructed solely on preoperative clinical and ultrasound parameters tend to have limited effectiveness (10).
Radiomics is a quantitative method for medical imaging that aims to uncover tumor patterns and characteristics imperceptible to the naked eye by automatically extracting latent data features from medical images, thus providing important value for the precise diagnosis and treatment of tumors (11).Previous studies have shown that preoperative CT and MRI radiomics have significant value in assessing CLNM in PTC patients (12,13).However, there have been relatively fewer studies on the role of ultrasound radiomics in evaluating CLNM in PTC patients (14).The objective of this study is to examine and confirm the effectiveness of different predictive models incorporating ultrasound radiomics features in predicting CLNM among patients diagnosed with solitary PTC.

Patients
With approval from our institutional ethics committee, we conducted a retrospective study.Due to the retrospective nature of the study, patients were exempted from the obligation of signing informed consent.The study population consisted of patients who underwent the CUS and contrast-enhanced ultrasound (CEUS) examinations at our hospital between January 2017 and December 2022 and were subsequently confirmed to have PTC on surgical pathology.The inclusion criteria were as follows: (1) patients who underwent preoperative CUS and CEUS at our hospital; (2) patients who eventually underwent surgery and were pathologically confirmed to have PTC; (3) patients with solitary nodules on both CUS and CEUS examinations.The exclusion criteria were: (1) mismatch between the nodule on ultrasound and pathology examinations; (2) patients who did not undergo cervical lymph node dissection.The flowchart of patient enrollment is shown in Figure 1.Finally, a total of 570 patients met the inclusion criteria of the study.Among them, there were 148 male and 422 female patients, with a mean age of 43.66 ± 11.90 years (range 17-80 years); the mean nodule size was 9.13 ± 6.01 mm (range 2.37-49.20 mm).According to the timing of recruitment, all patients were divided into training set (January 2017 -May 2020) and testing set (June 2021 -December 2022), with 456 patients in the training set and 114 patients in the testing set.

Ultrasound examination
All thyroid ultrasound images were acquired using one ultrasound diagnostic systems, Aplio 500 (Toshiba, Japan).During thyroid ultrasound examinations, we used high-frequency linear array transducers, with a frequency range of 7-14 MHz.To ensure clear image quality, we set the gain at 84dB and adjusted the depth and time-gain compensation appropriately.Patients were in the supine position with neck hyperextended and slightly rotated to one side, keeping the neck straight.Examinations started with CUS first, followed by CEUS.It is noteworthy that patients were required to sign informed consent before CEUS examinations.
In CUS, we first performed grayscale ultrasonography to scan the entire thyroid.During this process, we paid close attention to the thyroid's size, echogenicity, and presence of nodules or masses.Upon detection of a concerning nodule, more detailed ultrasonographic characterization was undertaken, documenting the nodule's location, number, size, echogenicity, margins, shape, presence of calcifications, relationship with the capsule, and assessment of capsule integrity.CDFI was also utilized to evaluate the vascularity within and surrounding the nodule.In addition to examining the thyroid and nodules, we also performed comprehensive ultrasound scans of the entire neck region to look for suspicious cervical lymph nodes metastasis (SLCNM).Lymph nodes were assessed for size, echogenicity, hilum, presence of calcifications, cystic degeneration, and vascular flow patterns.We stored the following static images of thyroid and cervical lymph node characteristics, including longitudinal and transverse grayscale images displaying the maximum diameters of nodules or lymph nodes, CDFI images, and images with typical imaging features such as liquefaction and calcification.Furthermore, we also stored the following dynamic images, including those depicting the nodule's relationship with the thyroid capsule in transverse view, as well as images indicating the continuity of the thyroid capsule.
After the CUS examination, CEUS mode was activated.We used a low mechanical index of 0.001for the CEUS examination.We utilized dual imaging mode to simultaneously display the tumor location and the CEUS pattern.An intravenous catheter (20G) was inserted into the patient's elbow vein, and 2 ml of contrast agent suspension was used.The ultrasound contrast agent was SonoVue (Bracco, Italy).The contrast agent was administered to the patient via bolus injection, followed by flushing with 10 ml of normal saline.Upon contrast injection, we started the timer on the ultrasound machine and recorded a video to document the dynamic contrast perfusion process for 1 minute.

Ultrasound image and clinical data analysis
The ultrasound imaging features included tumor location, size, margin, shape, aspect ratio, calcification pattern, capsule contact, loss of capsule continuity, CDFI pattern, SCLNM, perfusion rate, homogeneity, enhanced intensity, and discontinuous capsule enhancement.The ultrasound imaging indicators are explained in detail in the Supplementary Material.All ultrasound images were independently assessed by two senior ultrasound specialists, each with over 5 years of experience in thyroid CEUS examinations.They were blinded to the CLNM features of PTC in the patients.The final assessments for each indicator were reached through consensus between these two assessors.In cases where there was disagreement between them, a third highly experienced physician with over 10 years of experience in thyroid CUS examinations reviewed the patient's images, and the results were based on the assessment of the third physician.
Clinical data were retrieved from the hospital information system of our institution.The clinical data included patient's age, gender, Hashimoto's thyroiditis, surgical procedures performed, postoperative pathology results, and presence of CLNM.The solitary PTC was defined as the condition observed after the surgical treatment of a thyroid nodule.The conclusive pathological result verified the existence of a PTC focus within the patient's thyroid, with only one such focus identified.

Ultrasound image segmentation
The workflow of radiomics analysis was shown in Figure 2.For radiomic analysis, grayscale ultrasound images (maximum diameter of tumor on longitudinal view) were acquired as jpg format before the region of interest (ROI) delineation and converted to nii.gz format.Ultrasound image preprocessing included resampling and normalization.Two radiologists with 5 and 7 years of experience in thyroid ultrasound examinations, blinded to CLNM status, utilized the ITK-SNAP software (version3.8.0, http://www.itksnap.org/)to delineate tumor ROIs from the images.The delineation was performed tightly along tumor margins.To ensure reliability and consistency of radiomics features, we randomly selected ultrasound images from 50 patients after one month and had the first radiologist re-delineate the ROIs.ICC (intraclass correlation coefficient) was used to assess intra-and inter-observer consistency.Parameters with ICC greater than 0.9 were considered to have good consistency and were included in the radiomics feature analysis.Radiomics features were extracted using the pyradiomics software (http:// pyradiomics.readthedocs.io).These features can be categorized into three groups: (I) shape features, (II) intensity features, and (III) texture features.Shape features describe the three-dimensional shape characteristics of the tumor.Intensity features describe firstorder statistical distributions of voxel intensities within the tumor.Texture features describe patterns in intensity, encompassing secondand higher-order spatial distributions of intensities.For texture feature extraction, various methods were utilized, including gray level co-occurrence matrix, gray level run length matrix, gray level size zone matrix, and neighborhood gray-tone difference matrix and gray level dependence matrix.After feature extraction, all radiomics features were normalized.Then, feature selection was performed using t-test or Mann-Whitney U test, retaining only radiomics features with p-value < 0.05.Subsequently, Spearman's rank correlation coefficients were computed between features, and only one of features with correlation greater than 0.9 between any two was kept.To reduce feature dimensionality while maintaining descriptive capability, least absolute shrinkage and selection operator (LASSO) was employed.Through 10-fold cross-validation, the l value that minimized cross-validation error was chosen, and the final retained non-zero coefficient features were used for model building.Next, a linear combination of the retained features was constructed, and a radiomics score was generated for each patient based on their model coefficients.All feature selection steps were performed on the training set, and the resultant features applied to the test set.

Radiomics signature establishment
The Establishment of the radiomics model involved the following steps: First, after Lasso feature selection, we input the selected features into different machine learning (ML) models including logistic regression (LR), support vector machine (SVM), random forest, K nearest neighbors (KNN), ExtraTrees, XGBoost, LightGBM and multilayer perceptron (MLP) to build models for predicting the risk of CLNM.We used 10-fold cross-validation to derive the final radiomics signature.In evaluating model performance, we compared the diagnostic efficacy of the various models using metrics like area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV) and negative predictive value Schematic diagram of ultrasound radiomics analysis and model development.The radiomics approach is employed to extract features within the region of interest of thyroid tumors in ultrasound images.Valuable features are acquired through dimensionality reduction for machine learning to establish radiomics models.Univariate and multivariate analyses are conducted to investigate independent risk factors for cervical lymph node metastasis in papillary thyroid carcinoma, and clinical models are developed using machine learning techniques.Subsequently, a nomogram is generated based on the optimal results obtained from the radiomics and clinical models.The diagnostic performances of diverse models are thoroughly assessed and compared.LR, Logistic Regression; SVM, Support Vector Machine; KNN, K-Nearest Neighbors; RF, Random Forest; MLP, Multilayer Perceptron.

Construction of clinical signature
For constructing the clinical model, we adopted the following approach: First, we performed univariate logistic regression analysis on clinical features, followed by multivariate logistic regression analysis on features with statistical significance to obtain the final predictors for establishing the clinical signature.It is important to emphasize that in developing the clinical signature, we similarly utilized the same ML models as the radiomics signature to ensure consistency of methodology and techniques.

Development of combined model
After integrating the radiomics signature with the clinical signature, we established a combined model and visualized it as a nomogram.We selected the optimal model results for obtaining the combined model.By evaluating and validating on the training and test sets, we calculated a series of metrics including AUC, accuracy, sensitivity, specificity, PPV, and NPV to assess the model's predictive performance.The AUC values are compared between the integrated model and the clinical and radiomics models in both the training and testing datasets.To assess the agreement between the models' predicted outcomes and actual observations, we also plotted calibration curves.

Statistical analysis
For statistical analysis, Python programming language (version 3.5.6)was used for data analysis.For continuous variables, mean and standard deviation or median and interquartile range were used for description, with t-test or U-test for intergroup comparison.For categorical variables, frequency or percentage was used for description, and chi-square test or Fisher's exact test for analysis.For selection of clinical indicators, univariate and stepwise multivariate analyses were utilized.DeLong tests were employed to compare the AUC values among different models.A two-tailed P value of < 0.05 was considered statistically significant.

Patients' clinical and ultrasound imaging characteristics
This study included 570 patients (148 males and 422 females).

Clinical prediction model results
Using the independent risk factors for CLNM, clinical signatures were built for the training and testing sets using ML models (Table 3).Results showed that RandomForest, ExtraTrees and XGBoost models exhibited overfitting.After comparative analysis, it was observed that the AUC values of LR and MLP models were similar on the training set.However, the AUC value of the LR model was higher than that of the MLP model on the testing set.Therefore, we ultimately chose the LR model as the optimal model, with AUC of 0.757, sensitivity of 0.897, specificity of 0.622, accuracy of 0.711, PPV of 0.547, and NPV of 0.920.

Ultrasound imaging radiomics prediction model results
A total of 1561 radiomics features were extracted from each patient's ultrasound image.After feature selection, 16 non-zero features were finally chosen for ML modeling (Figure 3).Using ML approaches, 8 radiomics models were built for the training and test sets respectively (Table 4).Based on the AUC performance of different ML models on the training and testing sets, we evaluated that SVM, KNN, RandomForest, ExtraTrees, XGBoost, LightGBM models exhibited overfitting.It was observed that The AUC values of LR and MLP models were similar on both the training set.However, the AUC value of the LR model was higher than that of the MLP model on the testing set.Therefore, we ultimately chose the LR model as the optimal model with a sensitivity of 0.513, specificity of 0.720, accuracy of 0.649, a PPV of 0.488, and a NPV of 0.740.

Construction and evaluation of the combined model
We used the optimal model results to construct a combined model, which we then visualized as a nomogram.To ensure consistent and objective assessment of clinical utility of the models, LR was chosen for both the clinical and radiomics signatures.In the training dataset, the AUC value of the nomogram was greater than that of the clinical and radiomics models (P = 0.027 and 0.002, respectively).Moreover, there was no significant statistical difference observed in the AUC values between the clinical model and the radiomics model (P = 0.356).In the testing dataset, the AUC value of the nomogram model was also greater than that of the radiomics model (P = 0.012).However, the statistical difference between the nomogram and the clinical model was not significant (P = 0.928).Furthermore, the AUC of the clinical model was higher than that of the radiomics model (P = 0.041).The nomogram demonstrated good agreement between predicted and actual observed values in both training and testing sets (P values for Hosmer-Lemeshow test were 0.280 and 0.051, respectively) (Figure 4).

Discussion
The findings of this study demonstrate that the radiomics model based on ultrasound image features has limited value in preoperatively predicting CLNM for patients with PTC.However, combining the radiomics features with clinical data can improve predictive performance.This result may offer valuable insights for tailoring personalized treatment strategies for PTC patients in clinical practice.
This study revealed an association between age and CLNM in PTC patients, indicating younger patients were more prone to developing CLNM.This is consistent with a recent study that also showed a negative correlation between age and CLNM, despite different age group cutoffs (15).Our study also revealed a higher CLNM detection rate in males, which is consist with the observations of Zhu et al. (16).The negative correlation in females may be related to hormonal and reproductive factors (17).Additionally, there was a positive correlation between larger tumor size and the CLNM.Larger tumors had a greater tendency to develop CLNM, consistent with prior research findings (18,19).This study identified tumor contact with the capsule as an independent risk factor for CLNM.In agreement with Wang et al. (20), we also emphasized the association between tumorcapsule correlation and CLNM.Seong et al. found that a distance from the capsule <1.9 mm was associated with CLNM in PTC patients (21).Different from their quantitative method, we opted for the assessment of capsule contact as the indicator, aiming to facilitate a more practical clinical application of our prediction model.Moreover, consistent with previous studies, the preoperative detection of SCLNM on ultrasound was also identified as a risk factor for CLNM (22,23).Despite the significant value of preoperative clinical characteristics in predicting CLNM, there are limitations in subjectivity for evaluation and analysis of certain ultrasound features.Hence, this study aimed to explore more objective, automated, and accurate preoperative CLNM prediction approaches to overcome these limitations.
Radiomics is a methodology that involves the extraction of a multitude of quantitative features from medical images through the utilization of data characterization algorithms.This method transforms digital medical images into high-dimensional data that is imperceptible to the human eye, extracting meaningful information hidden in the images that may have value for decision support, personalized medicine, and predictive modeling.This study highlights the substantial significance of texture features in predicting CLNM within the tumor region of PTC.Following a dimensionality reduction analysis of radiomics features extracted from the tumor's ROI, a total of 16 radiomics features were ultimately retained and employed for machine learning modeling.Among these features, 75% (12/16)are texture features, concentrating on assessing the contrast of grayscale distribution, the consistency and repeatability of the texture, the complexity and disorder of the texture, as well as the linear correlation of grayscale values between pixels and their neighboring pixels in the tumor.Additionally, 25% (4/16) features are categorized as first-order features.These first-order features evaluate tumor heterogeneity by scrutinizing variations in pixel intensity within the tumor region.They encompass the mean, median, standard deviation, maximum, and minimum pixel intensities in the tumor, providing insights into the distribution and fundamental properties of pixel intensities in the image, including overall brightness, contrast, and uniformity.This finding was consistent with the results reported by Park et al. (24).These features delineated the intensity and distribution of gray levels within tumors, which could potentially be associated with alterations in the structure and density of tumor cells.Radiomics features can reflect underlying pathologic alterations, thereby providing important evidence for preoperative prediction of CLNM in PTC patients.
In this study, we performed ML modeling using radiomics features from ultrasound images after Lasso feature selection.However, logistic regression (LR) was eventually chosen as the model.While the majority of ML models, including SVM, KNN, random forest, ExtraTrees, XGBoost, and LightGBM, demonstrated strong performance on the training set, their AUCs exhibited a significant decline on the testing set, indicating overfitting problems.In comparison, the LR model maintained relatively high AUC on the testing set.Similarly, in building clinical signature, we also utilized various MLmodels, which showed overfitting issues with RandomForest, ExtraTrees, XGBoost, LightGBM models.Ultimately, we selected the model that demonstrated superior performance on the testing set as the clinical signature, which remained the LR method.In exploring the integration of clinical and radiomics models, we combined the optimal results of the clinical model with those of the radiomics model to create a combined model.Results demonstrated the combined model achieved the highest AUC, followed by the clinical model, and then the radiomics model.In this study, the AUC value of the radiomics model was similar to the clinical model in the training set but lower in the testing set.This suggests that, in clinical practice, radiomics technology cannot replace traditional clinical indicators.We also found that a new model combining radiomic and clinical outcomes showed higher diagnostic performance in both the training and testing sets compared to the standalone radiomics model.Additionally, the combined model exhibited superior performance only in the training set compared to the clinical model.In the testing set, their diagnostic performance was similar, indicating the crucial value of clinical indicators in predicting CLNM in PTC patients.However, one of the primary advantages inherent in radiomic method is their ability to provide an objective assessment (25).In clinical practice, it is essential to combine ultrasound radiomics features along with clinical characteristics to optimize the preoperative prediction of CLNM in patients with PTC.
There are some limitations in this study.First, this was a singlecenter retrospective study, which may introduce selection bias.Thus, the results need to be verified in larger-sample, multicenter, prospective studies.Second, evaluation of clinical ultrasound features has a certain degree of subjectivity.Although two experienced physicians performed independent assessments and consistency evaluation was carried out in this study, subjectivity may still affect the diagnostic performance of the clinical and combined models.Third, this study only analyzed radiomics features of the primary thyroid tumor ultrasound images, without in-depth analysis of radiomics features of the lymph nodes.In future studies, we plan to construct multi-modal combined models to incorporate lymph node radiomics features and thereby further improve model prediction performance.Finally, tumor ROI delineation in this study was performed manually, which is notably inefficient.In future studies, we will try automatic delineation approaches to reduce human intervention and improve research efficiency and objectivity.

Conclusion
Ultrasound radiomics technology provides a quantitative and objective means for predicting CLNM in PTC patients.However, the value of traditional clinical indicators remains irreplaceable, underscoring the imperative need for their combined utilization in clinical practice.

FIGURE 1
FIGURE 1 Patient inclusion and exclusion flowchart.This flowchart delineates the comprehensive patient inclusion and exclusion criteria for this study conducted on individuals diagnosed with solitary PTC at our institution.PTC, Papillary Thyroid Carcinoma; CUS, Conventional Ultrasound; CEUS, Contrast-enhanced Ultrasound.

4
FIGURE 4 Diagnostic performance of different predicting models.(A) The performances of various models in the training set.The AUC value of the nomogram was superior to both the clinical and the radiomics models.Additionally, the clinical model demonstrated a similar AUC value compared to the radiomics model.(B) The performance of the models in the testing set.Both the nomogram and the clinical model exhibited higher AUC values than that of the radiomics model.(C, D) represent the calibration curves for the three models in the training and testing datasets, respectively."*" represents the statistical P-value when comparing the AUCs of the nomogram and the radiomics model."#" indicates the statistical P-value when comparing the AUCs of the nomogram and the clinical model."&" indicates the statistical P-value when comparing the AUCs of the clinical model and the radiomics model.AUC, area under the curve.

TABLE 1
Analysis of clinical data and ultrasonographic features in training and testing sets of patients with PTC.

TABLE 1 Continued
Indicates P<0.05 between the CLNM (-) and the CLNM (+) group.† Indicates P<0.05 between the training set and the testing set.SCLNM, suspicious cervical lymph node metastasis; CDFI, Color Doppler flow imaging. *

TABLE 4
Diagnostic performance of different ultrasound radiomics models in the training and test sets.