Multi-modal ultrasound multistage classification of PTC cervical lymph node metastasis via DualSwinThyroid

Objective This study aims to predict cervical lymph node metastasis in papillary thyroid carcinoma (PTC) patients with high accuracy. To achieve this, we introduce a novel deep learning model, DualSwinThyroid, leveraging multi-modal ultrasound imaging data for prediction. Materials and methods We assembled a substantial dataset consisting of 3652 multi-modal ultrasound images from 299 PTC patients in this retrospective study. The newly developed DualSwinThyroid model integrates various ultrasound modalities and clinical data. Following its creation, we rigorously assessed the model’s performance against a separate testing set, comparing it with established machine learning models and previous deep learning approaches. Results Demonstrating remarkable precision, DualSwinThyroid achieved an AUC of 0.924 and an 96.3% accuracy on the test set. The model efficiently processed multi-modal data, pinpointing features indicative of lymph node metastasis in thyroid nodule ultrasound images. It offers a three-tier classification that aligns each level with a specific surgical strategy for PTC treatment. Conclusion DualSwinThyroid, a deep learning model designed with multi-modal ultrasound radiomics, effectively estimates the degree of cervical lymph node metastasis in PTC patients. In addition, it also provides early, precise identification and facilitation of interventions for high-risk groups, thereby enhancing the strategic selection of surgical approaches in managing PTC patients.


Introduction
Papillary Thyroid Carcinoma (PTC) is the most common type of thyroid cancer, constituting 85-90% of malignant thyroid tumors.Although PTC progresses slowly with a generally favorable prognosis, the onset of cervical lymph node metastasis in patients can significantly increase the risk of recurrence and distant metastasis, ultimately leading to potential mortality.2015 American Thyroid Association Management Guidelines for Adult Patients with Thyroid Nodules and Differentiated Thyroid Cancer emphasized that the number of cervical lymph node metastases is a crucial factor in assessing the recurrence risk of thyroid cancer (1).An increase in the number of lymph node metastases corresponds to a poorer clinical outcome for the patient, with a consequent reduction in the 5-year survival rate (2)(3)(4).
In the rapidly advancing realm of medical imaging, two principal ultrasonography techniques have emerged as key in predicting cervical lymph node metastasis in thyroid cancer.The first method meticulously examines the primary tumor, while the second method assesses suspicious lymph nodes.The assessment of suspicious lymph nodes to gauge the aggressiveness of thyroid cancer is a direct strategy, but the intricate anatomy of the thyroid gland, coupled with imaging technology limitations, turns preoperative ultrasonography of cervical lymph nodes into a complex task that frequently obstructs the swift identification of suspicious nodes.Therefore, the bulk of research, informed by a pragmatic approach, is derived from studies of the primary tumor, investigating attributes closely linked to the spread of cancer to cervical lymph nodes (5,6).
The recent technological renaissance has fostered the ascent of radiomics, transcending traditional paradigms of medical imaging analysis.Algorithms meticulously mine imaging data, unveiling hidden information and enabling a comprehensive evaluation of tumor heterogeneity.This forms a foundational framework for the development of precise diagnostic and treatment models, reinforcing the pillars of clinical decision-making.Deep learning stands at the forefront of this innovation, threading significant breakthroughs in computer vision into the fabric of Artificial Intelligence (AI).This technological wonder is extensively applied in medical imaging for tasks such as segmentation, localization, detection, and image fusion, thus elevating the diagnostic precision for pathological changes.Deep learning differs from traditional machine learning-which requires intensive image preprocessing and manual feature identification-by skillfully utilizing raw pixel values from images as input and iteratively refining its models through training (7).However, to our knowledge, there has yet to be any research employing multi-modal ultrasonic radiomics data to develop corresponding deep learning models for evaluating the lymph node status of primary lesions (8,9).
This study presents DualSwinThyroid, a deep learning classification model meticulously designed for evaluating the invasiveness of thyroid nodules.The model's 'Dual' structure processes multi-modal data, and its 'Swin' element utilizes the Swin-Transformer's (10) advanced image processing capabilities for thorough analysis and extraction of features from ultrasound images.'Thyroid' in its name highlights the model's specific application to thyroid nodule assessment.DualSwinThyroid is predicated on the Swin-Transformer's solid framework and is finely calibrated to harness not just its imaging strengths but also to Integrate clinical and ultrasonic data characteristics.Such integration sharpens diagnostic accuracy and enhances efficiency, leading to more targeted and evidence-based treatment plans for patients.Additionally, DualSwinThyroid transcends the conventional binary classification of nodules, introducing a tripartite categorization that corresponds with specific therapeutic approaches and provides a clear framework for complex clinical decision-making processes, aiding in the navigation of diverse treatment alternatives.

Patients
Approval for this retrospective study was obtained from the Ethics Committee of the First Hospital of Shanxi Medical University and the informed consent requirement was waived.Data were gathered from patients who underwent thyroid ultrasonography and subsequent surgical treatment in this hospital from July 2021 to June 2023.Through a rigorous selection process guided by predefined inclusion and exclusion criteria, 299 patients were enrolled, encompassing 339 thyroid nodules captured in 3652 ultrasound images.The postoperative pathological findings served to divide the data into three classes: Class I denoted no lymph node metastasis, Class II included cases with up to five metastatic lymph nodes, and Class III involved cases with more than five metastatic lymph nodes.Figure 1 displays the types of data images collected.
During data analysis, stringent inclusion and exclusion criteria were applied.The inclusion criteria consisted of: (1) patients who had a total or subtotal thyroidectomy with cervical lymph node dissection; (2) nodules with confirmed surgical pathological diagnoses of papillary thyroid carcinoma; (3) patients who underwent routine ultrasonography and elastography within two weeks before surgery, obtaining clear, complete, and original DICOM images.The exclusion criteria were: (1) patients who received radiofrequency ablation, radiation therapy, or chemotherapy before surgery; (2) ultrasound images of the target tumor marred by artifacts; (3) patients with other malignant tumors; and (4) patients with prior thyroid surgery.

Data collection
All ultrasound scans were performed using the Canon Aplio i800 Color Doppler Ultrasound Diagnostic Device, equipped with an i18LX5 linear wide-band probe with a frequency range of 5-18MHz and real-time ultrasound elastography technology.During routine examinations, patients were positioned supine to expose their necks for scanning.The physician conducted a thorough examination of the thyroid's bilateral lobes and isthmus, focusing on capturing the echogenicity, dimensions, and vascular flow within the gland.Additionally, for each thyroid nodule, precise records were made of its location, size, composition, echogenicity, shape, margins, presence of hyper-echoic areas, and vascular flow characteristics.
In the ultrasonic elastography examination, the region of interest on the elastographic image was configured to include the entire thyroid lesion and adjacent normal tissue, typically extending 2-3 times beyond the nodule's size.Patients were asked to hold their breath during the procedure.The physician then positioned the probe perpendicularly on the skin, exerted steady pressure with minimal vibration, and attentively monitored the color patterns displayed on the elastographic images, ensuring to archive the pertinent images.

Data processing
For the purposes of this study, data from 299 patients were included and subsequently randomized into training and testing datasets at a 7:3 ratio.The image data for each patient followed the same categorization protocol.The DualSwinThyroid model underwent training on this dataset and its performance was benchmarked against the Swin Transformer image processing model and the MLP clinical and ultrasound information classification model (11).The construction, training, and prediction of the models were carried out using Python (version 3.8.0).Statistical analysis of the data and calculations of relevant variables were performed using SPSS (version 26.0,IBM Corporation, Armonk, New York).

Statistical analysis
Normality and variance homogeneity tests were conducted for patient characteristics like age and nodule size.Subsequently, the Chi-Squared Test test was utilized to evaluate differences in ultrasonic and clinical features across patient cohorts.Multivariate ordinal logistic regression analysis was applied to determine independent risk factors influencing the extent of PTC lymph node metastasis, with statistical significance established at a two-tailed P-value of less than 0.05.ROC curves were then constructed based on these identified independent risk factors.

Model design and training
This study utilized data from 299 patients to train the model, with 209 allocated to the training set and 90 to the test set.Additionally, of the 3652 nodule images, 2556 were used for training and 1096 for testing, with the pathological outcomes as the labels for training.It is important to note that the test data were not used during the model's training phase.The Adam optimizer was employed to train the model across 500 epochs, with a batch size of 16 and an initial learning rate set at 0.0001.The computational work was performed on a platform equipped with an i7-13900F CPU and an RTX 4080TI GPU, and the network architecture was developed on Pytorch 2.0.0+cuda1.18.For more information on the training process, please see Figure 2.

DualSwinThyroid model
In this research, the DualSwinThyroid model serves as a deep learning instrument for evaluating thyroid nodule invasiveness and risk levels, detailed in Figure 3.It utilizes the Swin-Transformer framework to adeptly process longitudinal sectional and transverse sectional, color Doppler ultrasound and elastographic images.The model operates through three primary image processing stages to extract features deeply and classify invasiveness with precision.Data, once categorized, enters the Data Fusion block, integrating with clinical data vetted through univariate analysis (p-value <0.05).After normalization, this combined data passes through a fully connected layer with a ReLu activation function, proceeds to a subsequent fully connected layer, and culminates in generating predictive probabilities for each category using the Softmax function.Significantly, DualSwinThyroid is capable of processing multiple ultrasound images from the same nodule to produce diagnostic predictions.Model training process.

Single modality model
A Multi-Layer Perceptron (MLP) model with eight neurons was developed for the classification of clinical and ultrasound data, employing ReLu as the activation function and Softmax for the output layer's classification purpose.Cross-entropy served as the loss function for optimization.In the training process, the model processed clinical and ultrasound data as inputs, with the features selected based on univariate analyses that produced p-values less than 0.05.
For the classification of images, the Swin-Transformer model was trained, utilizing its Swin-base as the pre-trained model.The input image data were primarily drawn from Regions of Interest (ROI) delineated by physicians during detailed scans and ultrasonic elastography of the thyroid's bilateral lobes and isthmus.

Evaluation metrics
After training each model, Receiver Operating Characteristic (ROC) curves were plotted, and Area Under the Curve (AUC) values were computed for performance evaluation.Algorithms processed ultrasound images of thyroid nodules to accurately determine the extent of metastasis.The test set was then used to further assess the model's predictive capabilities, including an examination of predicted outcomes and an evaluation of predictive accuracy.Graphs depicting the evolution of prediction accuracy and loss throughout the training were plotted for visual representation, as illustrated in Figure 4.
confirming Papillary Thyroid Carcinoma (PTC), 3 cases for missing lymph node dissection records, and 68 cases for incomplete image data.Consequently, the study was narrowed down to 299 cases, which included 339 thyroid nodules and 3652 ultrasound images, along with 19 clinical and ultrasound features, detailed in Figure 5.

Statistical analysis results
Univariate analysis assessed clinical and ultrasonic features linked to the degree of cervical lymph node metastasis in Papillary Thyroid Carcinoma (PTC), examining 339 nodules.These were divided into three groups by metastasis count: 158 nodules in Group I, 120 in Group II, and 61 in Group III.The analysis revealed statistically significant differences in various factors, including age, gender, nodule location 2, maximum diameter, boundary, homogeneity, longitudinal-transverse ratio, halo sign, type of calcification, calcification ratio, capsular invasion, blood flow signal, and ultrasound-detectable suspicious lymph nodes.Each factor had P-values below 0.05, which are presented in Table 1.
Multivariate analysis using ordered logistic regression determined that variables like age over 45 years, a maximum nodule diameter of 1.0 cm or less, male gender, nodule position at the upper and lower poles, specific calcification types, and ultrasound-visible suspicious lymph nodes were statistically significant.Specifically, being over 45 years old, having nodules with a maximum diameter of 1.0 cm or less, microcalcification, coarse calcification, and a longitudinal-transverse ratio of 1 or less were identified as protective factors.In contrast, being male, nodules at the upper and lower poles, and suspicious lymph nodes on ultrasound were established as independent risk factors.These findings are elaborated in Table 2.

Model performance results
The development of the DualSwinThyroid model incorporated a 5-fold cross-validation approach to optimize hyperparameters.The model's predictive performance was thoroughly evaluated, tracking not just the loss curve but also deriving accuracy metrics from the test set.The ROC curve was plotted, and the AUC value was calculated, as shown in Figure 4 (Training Curve).Additionally, essential metrics like sensitivity and specificity were analyzed.
In training the single-modality models, training accuracy and loss were also monitored, as shown in Figure 4.A significant observation was the Swin-Transformer's classification results varying considerably with different image data types in the test set.Color Doppler ultrasound and elastography images yielded higher classification accuracy than transverse and longitudinal images, as evidenced by superior accuracy in Figure 6 (Classification Performance).

Discussion
In Papillary Thyroid Carcinoma (PTC), cervical lymph node metastasis is a common occurrence, with an estimated 40-90% of PTC patients potentially experiencing such metastasis (12, 13).In 2015, the American Thyroid Association issued management guidelines for adult thyroid nodules and differentiated thyroid cancer patients, highlighting the degree of cervical lymph node metastasis as a significant indicator of thyroid cancer recurrence.To curb the rapid progression of PTC, a crucial step is early identification of tumors with metastatic potential in clinically diagnosed PTC patients.Surgical removal is regarded as the primary treatment modality (14).For patients at risk of suspected lymph node metastasis, prophylactic central or lateral neck dissection is recommended.When performing prophylactic lymph node dissection and total thyroidectomy, the increased risk of postoperative complications, especially hypoparathyroidism, must be considered.The study by Henry et al. (15) reported that central neck lymph node dissection could escalate the risk of permanent hypoparathyroidism from 0% to 4%.However, research by Nixon I J et al. indicates that reoperation post PTC recurrence is relatively challenging, significantly elevating surgical complications and impacting the quality of life of patients (16).Hence, early identification of cervical lymph node metastasis in PTC not only aids in clinically selecting the appropriate surgical plan and scope, reducing the occurrence of postoperative complications, but also in minimizing recurrence risks, averting secondary surgeries, and proactively improving prognosis.Nonetheless, the sensitivity of solely relying on ultrasonographic characteristics to indicate lymph node metastasis remains insufficient, with some lymph node metastases exhibiting unremarkable ultrasonic features -a common scenario in clinical practice.Studies also revealed a mere 33% sensitivity of ultrasound in detecting central lymph node metastasis (17).Preoperative neck ultrasonography is inevitably influenced by inter-observer variability, thereby rendering the diagnostic outcome lacking in certain objectivity.
This study conducted a thorough investigation into the independent risk factors for cervical lymph node metastasis in Papillary Thyroid Carcinoma (PTC), analyzing 3,652 multi-modal ultrasonographic images and data from 299 patients.The results underscore the need for increased attention to cervical lymph node metastasis and recurrence risk, particularly in male patients, those    aged 45 or younger, with thyroid cancer nodules larger than 1cm, nodules at the lower pole and center of the gland, presence of calcification, and suspicious lymph nodes detected by ultrasonography.To improve the accuracy of predicting cervical lymph node metastasis in PTC, the study introduced the DualSwinThyroid Model, a deep learning tool combining ultrasonographic images with clinical data.The model achieved a high AUC of 0.924, with an accuracy rate of 96.3%, and commendable sensitivity and specificity, highlighting its potential as an effective non-invasive assessment tool for lymph node involvement in PTC.Model performance is detailed in Table 3.
Earlier research has typically hinged on machine learning and statistical methods, analyzing image, clinical, and ultrasonic features in isolation, without an in-depth approach (18)(19)(20)(21)(22)(23).Luchen Chang et al. employed deep learning, alongside ultrasonic and clinical data, to develop a composite nomogram for predicting central lymph node metastasis in PTC patients, using grayscale images for radiomics and six related features.However, the absence of multi-modal ultrasonic data and additional correlating factors slightly impeded the predictive accuracy (22).Fu Li et al. achieved promising results in forecasting cervical lymph node metastasis using conventional machine learning models (19).These models, however, sometimes fail to discern complex patterns in large datasets, particularly with fluctuating data distributions or noise, which can hinder their generalization.Additionally, they depend on manual feature processing, which can compromise performance if not done meticulously.Notably, most current studies focus on merely detecting cervical lymph node metastasis, which affects the precision of choosing surgical methods and predicting patient outcomes.Details on comparative studies can be found in Table 4.
The DualSwinThyroid model introduces a refined thyroid nodule management method by categorizing them into three different levels to enhance clinical decision-making accuracy.For Class I nodules without lymph node metastasis, unilateral lobectomy and isthmectomy are advised as the primary surgical path.In Class II cases with less than five metastatic lymph nodes, the model recommends total or near-total thyroidectomy and 'selective' cervical lymph node dissection, in conjunction with clinical judgment.The term 'selective' cervical lymph node dissection, as used here, refers to a process where the surgeon, integrating systematic preoperative evaluation with intraoperative biopsy pathology, determines the necessity of lymph node dissection and identifies the specific regions for dissection, thereby minimizing the risk of secondary surgical interventions.The prediction accuracy of different data in the corresponding model.The evaluation indicates that multi-modal data performs optimally within the test set.Specifically, when considering single modality data, the accuracy of Doppler ultrasound images and elastography images within the test set surpasses that of transverse and longitudinal images.Throughout the development of our model, several key findings emerged.Initially, it was apparent that the accuracy of predictions using single-modality data alone was relatively low, with image data outperforming clinical and ultrasonic data, highlighting the importance of imaging in predictive models.In the training dataset, accuracy confidence was higher for transverse and longitudinal images than for color Doppler and elastography.Yet, this pattern shifted in the testing phase, where color Doppler and elastography images achieved greater accuracy, revealing variable performance across imaging types during different phases.Moreover, integrating four types of images-transverse, longitudinal, color Doppler, and elastography-with clinical and ultrasonic data provided the most accurate predictions for individual cases.Adding more images did not improve but rather slightly reduced the predictive performance.These findings have been instrumental in fine-tuning the model and determining the most effective imaging techniques to increase accuracy, setting a course for future advancements in model enhancement.
This study successfully developed a multi-modal ultrasound radiomics deep learning model for predicting cervical lymph node metastasis in PTC.It aims to leverage machine learning to identify features related to metastasis, thereby improving surgical decision-making in PTC.However, the work is retrospective and exploratory in nature.Moreover, its scope is limited by its singleinstitution design, which might not fully represent a wider patient population.Currently, the final pathological assessment of the extent of lymph node metastasis still relies on the thoroughness and accuracy of surgical removal.While the prediction model offers a potential foundation for clinical decision support, its benefits are yet to be confirmed in prospective clinical trials.Therefore, the anticipated advantages, such as reducing lymph node dissections, financial burden, and supporting emerging practitioners, are promising but need further validation in various clinical settings.Future efforts will focus on expanding data sources and rigorously validating the model's clinical utility in different institutions.

FIGURE 1
FIGURE 1Different categories of image data.

FIGURE 5
FIGURE 5Inclusion and Exclusion diagram.

TABLE 1
Univariate analysis of factors associated with cervical lymph node metastasis in PTC.

TABLE 2
Ordered regression analysis of features associated with the extent of cervical lymph node metastasis in PTC.

TABLE 3
Performance comparison of models in this study.
*Bold font denotes the predictive performance of the optimal model in this study.For Class III nodules with more than five metastatic lymph nodes, total or near-total thyroidectomy and extensive cervical lymph node dissection are advised, which is more extensive than standard elective central neck dissection, especially in cases of widespread metastasis.Physicians expand the scope of dissection based