- 1Department of ultrasound, The First Affiliated Hospital With Nanjing Medical University, Nanjing, China
- 2Department of nuclear medicine, The Shanghai Tenth Clinical Medical College with Nanjing Medical University, Shanghai, China
Background: Follicular thyroid carcinoma (FTC) is the second most common malignant thyroid tumor. Preoperative differentiation among follicular thyroid adenoma (FA), follicular tumor of uncertain malignant potential (FT-UMP), and FTC remains challenging using conventional ultrasound and fine-needle aspiration. This study aims to develop a machine learning model utilizing ultrasound radiomic features to improve risk stratification of follicular thyroid tumors.
Methods: A total of 277 patients with histopathologically confirmed follicular tumors (163 FA, 63 FT-UMP, 51 FTC) were included. Clinical and ultrasound features, along with radiomic features from intratumoral and peritumoral regions, were extracted from preoperative ultrasound images. Three machine learning models—logistic regression (LR), support vector machine (SVM), and random forest (RF)—were trained to construct four models: clinical-ultrasound (U), clinical-ultrasound with intratumoral radiomics (UI), clinical-ultrasound with peritumoral radiomics (UP), and clinical-ultrasound with combined intratumoral and peritumoral radiomics (UIP).
Results: The RF-based clinical-ultrasound model demonstrated the highest accuracy (test: 0.643) but exhibited significant overfitting in radiomics-based models. The SVM model showed moderate performance. The LR model in the UP and UIP models delivered stable performance, achieving the highest test accuracy of 0.643. Specifically, the UP model showed improved micro-AUC, specificity, negative predictive value (NPV), and F1 score. The LR model exhibited high sensitivity but low specificity for benign nodules, and high specificity but low sensitivity for malignant nodules. All models performed poorly in identifying FT-UMP nodules.
Conclusion: Integrating peritumoral radiomic features with clinical-ultrasound features using logistic regression enhances the differentiation between benign and malignant follicular thyroid tumors.
1 Introduction
Follicular thyroid carcinoma (FTC) is the second most common thyroid malignant tumor, accounting for about 17% of all thyroid cancers (1). About 10–15% of FTC patients develop distant metastases to the lungs and bones, leading to poorer survival outcomes compared to papillary thyroid carcinoma (PTC) (2–4). Thus, early diagnosis of FTC is crucial for treatment and prognosis.
Ultrasound is the primary imaging modality for evaluating thyroid nodules, enabling classification based sonographic features and assessment of malignant potential (5). The currently widely used TI-RADS classification systems are built on this, while they are mainly established based on the ultrasound features for PTC, and have certain limitations in the diagnosis of follicular tumors (6, 7). In ultrasonographic images, the FTC has some similarities with benign (Follicular Thyroid Adenoma,FA) and low-risk (Follicular Tumor of Uncertain Malignant Potential, FT-UMP) nodules (8). Even fine-needle aspiration (FNA) cannot reliably differentiate among these entities (9). Only complete removal of the lesion and exploration of its capsular infiltration can clearly determine the specific type of the nodules (10, 11). In addition, the clinical management of low-risk nodules is also controversial.2025 ATA Management Guidelines for Adult Patients with Differentiated Thyroid Cancer state that further treatment with completion thyroidectomy/lymphadenectomy and/or radioactive iodine therapy for low-risk nodules like FT-UMP is not advised routinely. The optimal approach to monitoring of these tumors should be determined according to the surgical approaches, laboratory findings, and the patient’s wishes (12). Therefore, extracting more diagnostic information from ultrasound images to find a more accurate method for differentiating follicular tumors, so as to reduce unnecessary surgeries/therapies, is currently the focus of research. Previous studies (13–15) have mostly focused on differentiating FTA and FTC, with limited attention to low-risk follicular tumors. It is worth noting that FT-UMPs are defined as well-differentiated thyroid tumors with follicular architecture that are encapsulated or unencapsulated but well-circumscribed, in which invasion remains questionable after thorough sampling and exhaustive examination (16, 17). That means FT-UMP is generally an indolent disease, but some patients still show distant recurrence. Due to its inherent uncertainties, there are certain challenges in determining the clinical follow-up period or making surgical decisions (18).
The aim of this study is to develop a hierarchical model based on ultrasound images using machine learning methods, in order to assess the risk stratification of follicular tumors and provide a reference for clinical decision-making.
2 Method
2.1 Patients
From January 2020 to September 2023, 277 patients with histopathologically confirmed follicular thyroid tumors (163 FTA, 63 FT-UMP, 51 FTC) were included. Preoperative neck ultrasound was performed by an operator with over 10 years of experience using EPIQ5 (Philips Healthcare) equipped with a 6–18 MHz linear transducer. The following sonographic features were recorded: nodule number (single, multiple); location (upper, middle, lower, isthmus); maximum diameter (≤1 cm, 1–4 cm, >4 cm); echogenicity (hyperechoic, isoechoic, hypoechoic); aspect ratio (<1, ≥1); composition (solid, predominantly solid, predominantly cystic); calcifications (absent,microcalcification,macrocalcification,rim calcification);shape (round-to-oval, irregular); margin (smooth, spiculated, unclear); halo (absent, regular complete, irregular interrupted); blood flow (grade 0: no signal; grade 1: partial clear signals; grade 2: moderate signals; grade 3: abundant signals); elasticity score (1–5).Ultrasound features were independently evaluated by 2 attending physicians with more than 5 years of experience in the thyroid imaging. They had no prior knowledge of the pathological findings. Features in which there was disagreement were recorded after agreement between the two physicians. Features with intraclass correlation coefficients (ICCs) greater than the reproducibility threshold of 0.9, both in intraobserver and interobserver assessments, were screened out for subsequent analysis.
2.2 Ultrasound radiomic analysis
2.2.1 Clinical and ultrasound features
Univariate and multivariate logistic analysis(ordered multinomial logistic regression) was conducted on clinical and ultrasound features. The response variable is a ordinal variable with three classes. Features with a statistical value of P < 0.05 were included in the subsequent analysis.
2.2.2 Radiomic features
ITK-SNAP 3.6.0 was used to manually delineate intratumoral regions of interest (ROIs) on DICOM images of the maximum cross section on grayscale US images, which was regarded as the intratumoral ROI. Peritumoral ROIs were generated by expanding the intratumoral boundary outward by 2 voxels. To appraise the feature reliability, reader 1 and reader 2 (with 5 years of experience in thyroid imaging) independently outlined ROIs on images from 30 randomly selected patients. Reader 2 repeated the same process within 1 week. The intra and interobserver reproducibility were assessed by the ICC, and an ICC greater than 0.9 indicated good consistency. Radiomics features were automatically extracted using an open-source Python package, PyRadiomics (https://pyradiomics.readthedocs.io/).
Feature values were normalized via Z-score transformation (Equation 3) (19). The normalization method is as follows:
column: represents a certain column (feature) in the dataset; mean: the mean of this column; std: the standard deviation of this column.
One-way ANOVA was conducted on all the normalized ultrasound radiomics features of FTA, FT-UMP and FTC groups, and only those with P < 0.05 were retained. The Pearson correlation coefficient was used to calculate the correlation between features. To maximize the discriminative power of radiomics features, we employed a correlation-based greedy recursive backward elimination strategy for feature selection. This method iteratively identifies and removes the most redundant features based on internal correlation structure. In each iteration, we computed absolute pairwise correlation coefficients among all remaining features and calculated the average correlation for each feature. The feature showing the highest average correlation, indicating the least unique information was eliminated. This process repeated recursively until the maximum pairwise correlation among remaining features fell below a predefined threshold(0.9). As a model-agnostic preprocessing approach, this method effectively mitigates multicollinearity while preserving the most representative features, thereby enhancing model robustness and interpretability. By explicitly addressing feature redundancy, our approach provides a computationally efficient framework for feature subset selection.
The selected features was randomly divided into a training cohort and a test cohort at a ratio of 8:2. Then, we input the features into a polynomial logistic regression model with L1 penalty term of the Least Absolute Shrinkage and Selection Operator (LASSO) algorithm to directly solve the three-classification problem. The penalty coefficient λ of LASSO was optimized for selection by 5-fold cross validation. The final selected results were included in the subsequent analysis.
2.3 Model construction
These features (clinical and ultrasound and radiomic features) were then input into machine learning models (Logistic Regression, LR;Support Vector Machine, SVM;Random Forest, RF) to construct intratumoral and intratumoral-peritumoral dual-region radiomics ultrasound radiomics models. Stratified five-fold cross-validation was used to find the optimal parameters of the models and determine the final models.
We also included a SHapley Additive exPlanations(SHAP) (20)method to enhance the transparency and explainability of the best-performing model by prioritizing the importance of features, in terms of assessing the contribution to model performance.
The workflow of this study is presented in Figure 1.
This study was approved by the Ethics Committee on October 25.2023.with the approval no.2023-SR-691.This study complied with the Declaration of Helsinki.
2.4 Statistical analysis
Statistical analyses were performed using IBM SPSS 26.0 (IBM, New York, USA). The normality of the data was verified via the Kolmogorov-Smirnov test. Continuous data conforming to a normal distribution are expressed as the mean ± standard deviation (SD) and were compared using one-way ANOVA; non-normal data are expressed as the median [interquartile range (IQR)] and were compared using the Kruskal-Wallis H test. Categorical data are expressed as numbers (%) and were compared using a chi-square test or Fisher’s exact test, as appropriate. A P < 0.05 represented a statistically significant difference.
3 Result
The clinical and ultrasound features in the training cohort and the test cohort (shown in Table 1) were included in the univariate and multivariate logistic regression analysis. The results (Table 2, Figure 2) showed that shape, margin, shadow and elasticity score(P<0.05) were independent risk factors. Based on these factors, machine learning models were constructed.
Table 2. Univariate and multivariate logistic regression analysis results of clinical and ultrasound features.
1561 radiomic features were extracted for the tumor interior and the tumor periphery, including 14 shape-based features; 306 first-order statistics; and 1241 second-order features. The extracted features were screened out based on the independent sample t-test (P < 0.05) for 129,57and 198 intratumoral, peritumoral and intratumoral-peritumoral dual-region features respectively. Then, 35,24 and 59 selected features were filtered based on the correlation coefficient. Finally, using the LASSO regression algorithm, the non-zero coefficients of the features were reduced to 14,15 and 24 predictive features respectively. Coefficients and mean standard error of 5 folds validation is show in Figure 3. Rad score is show as follow, Figure 4 shows the coefficients value in the final selected non-zero features. Then these features were input into machine learning models to construct ultrasound imaging radiomics models for intratumoral, peritumoral and intratumoral-peritumoral dual-region.
3.1 Clinical-ultrasound features model
As shown in Table 3 & Table 4, the RF model achieved the highest accuracy (0.647 train,0.643 test)and micro-AUC(0.805 train,0.772 test),and also demonstrates the most stable ROC performance across different nodule classifications (Figure 5). SVM model showed moderate performance, with a slight decrease in test performance (accuracy: 0.593 train vs. 0.571 test),which was the worst performance among the three models. The accuracy of the LR model was moderate, but its micro-AUC was the lowest among the three models (0.661train,0.669test).
All models showed high sensitivity(>0.9) but low specificity(<0.3) for benign nodules, and high specificity(>0.9) but low sensitivity(<0.4) for malignant nodules, while performance was poor in low-risk group with near-zero sensitivity. Among them, RF achieved the highest F1-scores for malignant nodules (0.459 train, 0.471 test).
3.2 Ultrasound radiomics-based models
3.2.1 Clinical-ultrasound and intratumoral radiomics features model
As shown in Table 3 & Table 4, the RF model achieved a perfect score on all metrics (Accuracy=1.0, AUC = 1.0, Sensitivity=1.0, Specificity=1.0)in training cohorts but a significant performance drop on the test cohort, confirming the overfitting of the data.
The accuracy of the LR model in the UI model was consistent with that in the U model, and also with that of the SVM model in the UI model(0.589 test).However, LR’s test micro-AUC was higher than SVM model(0.733 vs 0.681)).The ROC of LR model was also higher than that of SVM model in different nodule classifications (Figure 5).
All models were highly biased in assessing benign, low-risk, and malignant nodules, with sensitivity and specificity data close to 0 or 1.
3.2.2 Clinical-ultrasound and peritumoral radiomics features model
As shown in Tables 3, 4, the RF model showed a clear overfitting performance again.
LR model showed slightly higher accuracy than SVM model(0.643 test vs 0.625 test), and both higher than that in UI model. The same is true in terms of ROC (Figure 5).
Notably, in the malignant nodule group, although the sensitivity(=0.3) was still low, the AUC and F1 of the models were significantly increased compared with U models in test cohort(UP model(LR AUC 0.8,F1 0.4;SVM AUC 0.796,F1 0.375) vs U model(LR AUC 0.297,F1 0.0;SVM AUC 0.525,F1 0.0) vs UI model(LR AUC 0.53,F1 0.0;SVM AUC 0.433,F1 0.0)).
3.2.3 Clinical-ultrasound and intratumoral-peritumoral dual-region radiomics features model
The accuracy of LR model and SVM model in UIP model was consistent with that in UP model(test 0.643,0.625), and the accuracy and ROC of LR model was higher than that of SVM model.
Despite its overfitting, the RF model’s test performance for the malignant group was the highest of all models(accuracy 0.929, AUC 0.833,sensitivity 0.60,NPV 0.92,F1 0.75), as shown in Tables 3, 4, Figure 5.
Figure 6 presents the confusion matrices for the all models, showing the classification performance for the three tumor classifications.
Figure 6. Confusion matrices showing classification performance for each tumor type. abscissa:actual class; ordinate:predicated class The numbers in the cells represent the number of patients predicted by the model.
3.3 Comparison of diagnostic efficacy between UP-LR model and TI-RADS
ACR-TIRADS was used to classify all included nodules as benign, low-risk, and malignant. We performed ROC curve analysis in three separate rounds(benign vs low-risk and malignant; low-risk vs benign and malignant; malignant vs benign and low-risk). In each round, the target TI-RADS category and its corresponding pathological result were coded as 0, while all other categories and pathological types were combined and coded as 1. ROC curve analysis was then employed to calculate the AUC and its confidence interval for that specific category and then compared with that of the best-performing model (UP-LR model),as showed in Table 5. In the diagnosis of benign nodules, the AUC of the UP-LR model was 0.705(95% CI 0.550–0.860); the AUC of TI-RADS was 0.525 (95% CI 0.525–0.659); In the diagnosis of low-risk nodules, the AUC of the UP-LR model was 0.454 (95% CI 0.272–0.637); the AUC of TI-RADS was 0.461 (95% CI 0.38–0.543);In the diagnosis of malignant nodules, the AUC of the UP-LR model was 0.8 (95% CI 0.623–0.977); the AUC of TI-RADS was 0.525 (95% CI 0.435–0.615).
3.4 SHapley Additive exPlanations of the best-performing model
As shown in Figure 7, for the three-category classification of thyroid nodules, the top-ranked features are almost exclusively peritumoral radiomics features, particularly wavelet-transformed first-order statistics (such as interquartile range and percentiles) and texture features (e.g., ngtdm_Strength). Traditional ultrasound features generally rank lower than the radiomics features. This highlights the importance and necessity of incorporating peritumoral regions into the analysis. From benign to low-risk to malignant nodules, as the invasiveness of the nodules increases, the values of these features tend to be higher and are more frequently associated with positive SHAP values.
4 Discussion
Preoperative diagnosis of thyroid follicular tumors is a research hotspot and always a challenge in clinical practice. However, as we understand, there are few articles focusing on the differentiation of tumors classified as FA, FT-UMP, and FTC. The challenge lies in determining the presence of invasive growth, particularly capsular invasion and vascular invasion (16, 21). Most radiomics studies focus on the features within the tumor and ignore the key biological features available in the surrounding region (22, 23).Therefore, based on ultrasound images, we have developed intratumoral and peritumoral radiomic models to explore better methods for distinguishing tumor classifications. Combining intratumoral and peritumoral radiomics, which captures the interplay between tumor biology and the microenvironment, may surpass methods involving only intratumoral features and could lead to novel applications in thyroid tumor radiomics (24, 25)Three machine learning models, LR,SVM and RF, were used to establish U,UI,UP and UIP models respectively by using clinical and ultrasound features, intratumoral radiomics and peritumoral radiomics. From the results, it can be seen that model LR in model UIP has better stability and generalization ability.
While the overall accuracy is low, the UP-LR model shows promising discrimination among benign, low-risk, and malignant cases(all AUC >0.70),with the highest performance observed in the malignant nodules.
Compared with the commonly used TI-RADS model, the diagnostic efficiency of this model for benign and malignant nodules is significantly higher. In diagnosing benign nodules, the model achieves a sensitivity of 1.00, ensuring that all benign nodules were identified, but its low specificity led to the possibility of misdiagnosis. This means that if the model classifies a nodule as non-benign, the result is highly reliable (NPV = 1.0). For diagnosing malignant and low-risk nodules, the model exhibited extremely high specificity(both >0.95), especially for malignant nodules (PPV = 0.6, specificity = 0.957),which means if the model classifies a nodule as malignant or low-risk, the result is highly reliable and further diagnostic confirmation is recommended. Notably, the AUC was 0.8 for malignant nodules and only 0.454 for low-risk nodules. It shows that the model performed poorly in identifying low-risk nodules. This may be related to the insufficient number of low-risk samples in the dataset and the significant overlap of features between low-risk and benign and malignant nodules. Indeed, incorporating the more challenging FT-UMP categories into the three-classification problem may indeed lead to a decrease in the overall performance of the model (26). FT-UMP may partially overlap with FTA and FTC in imaging, increasing the difficulty of classification. FT-UMP is diagnostically challenging, even for pathologists, which is also reflected in our models. Specifically, we suggest that this model can be used as a complementary tool after inconclusive FNA results (Bethesda III/IV) to help prioritize high-risk patients for surgery and reduce unnecessary interventions in low-risk cases.
The RF model demonstrated the best performance in the U model, with high accuracy and AUC. However, in the UI, UP, and UIP models, it exhibited significant overfitting. Although this model showed the highest diagnostic value for malignant nodules (AUC 0.833), this resulted in considerable instability when applied to unknown data. Therefore, it is not recommended as a primary screening tool but could be considered as a secondary confirmation tool for high-risk nodules. In contrast, LR and SVM models exhibited better stability, though SVM’s overall efficacy was lower than that of LR. Considering the performance of all models, the LR model achieved higher AUC and accuracy in the test cohort, particularly in the UP model, which outperformed the UIP model in accuracy, specificity, PPV, sensitivity, and F1 score. We observed that incorporating intratumoral radiomic features did not improve model performance; instead, it led to a decline in certain key metrics. This suggests that intratumoral radiomic features may be redundant with peritumoral or clinical and ultrasound features, indicating that peritumoral radiomic features provide more critical discriminative information, which is similar with some previous studies (24, 27).
As previously described, follicular tumors share a similar underlying structure. Histologically, they form follicular structures resembling the normal thyroid gland, albeit with varying degrees of differentiation and spatial arrangement. The key diagnostic criteria revolve around the presence of invasive growth—specifically, its type and extent. A diagnosis of FTC requires evidence of definite vascular invasion or extensive capsular invasion. In contrast, FT-UMP represents a borderline category where the pathologist observes suspicious or focal signs of invasion under the microscope, which are insufficient to meet the strict diagnostic thresholds for FTC (28). This pathological continuum explains the considerable challenge in distinguishing these entities in our radiomics study, as their essential difference lies in microscopic invasive behavior rather than macroscopic imaging findings. We aimed to capture more discriminative information regarding this behavior through the selection of peritumoral radiomics features.
When capsular or vascular invasion occurs in a thyroid nodule, remodeling of the extracellular matrix, increased local echogenicity heterogeneity, and ill-defined margins may contribute to elevated tissue heterogeneity in the peritumoral region. These changes are reflected in radiomic features such as InterquartileRange and Variance. The formation of neovascularization and tumor thrombi may correlate with high-intensity image traits, including the 90th Percentile, Maximum, and various HighGrayLevelEmphasis features. Meanwhile, attributes such as Strength may also indicate underlying tissue destruction and remodeling processes (5, 29).
These observations align with the results of our SHAP analysis. In the three-class classification of thyroid nodules, all peritumoral radiomics features were ranked as top contributors, outperforming traditional ultrasound characteristics. This reinforces the proposition that peritumoral radiomics features serve as the most powerful indicators in our model for stratifying tumor risk, capturing information beyond conventional visual assessment criteria. It further validates the value and necessity of our study design incorporating the peritumoral region. With increasing nodule aggressiveness, the values of these imaging features tend to rise and more frequently assume positive values—a pattern fully consistent with clinical understanding and supportive of the rationality of the model’s decision logic. Thereby, the model’s applicability and reliability in clinical practice are considerably enhanced.
Our study also has certain limitations. First, we agree that the poorer performance of the low-risk category (FT-UMP) is a major limitation. We will recommend that future studies integrate molecular markers, such as mutational analysis, with radiomics to improve the classification of critical categories such as FT-UMP. Second, the sample size of our study was limited and all were from a single center, which showed overfitting results. Therefore, we need to expand the sample size or conduct multicenter studies for external validation to further refine our models. In this study, only gray-scale ultrasound images were used to extract radiomics features. We found that the existing multimodal ultrasound images (contrast-enhanced ultrasound, elastic ultrasound) were also used to obtain radiomics features to understand the lesions more comprehensively (30–32). To address this issue, further prospective studies could be conducted to improve the prediction accuracy in our future work.
5 Conclusion
The logistic regression model incorporating clinical-ultrasound features along with peritumoral radiomic features achieved stable performance in distinguishing between benign and malignant follicular thyroid tumors. It offers a valuable reference for clinical decision-making in the management of follicular thyroid tumors.
Data availability statement
The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding author.
Ethics statement
The studies involving humans were approved by the Ethics Committee of the First Affiliated Hospital of Nanjing Medical University (Jiangsu Provincial People’s Hospital)on October 25.2023 with the approval no.2023-SR-691. This study complied with the Declaration of Helsinki. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.
Author contributions
YY: Conceptualization, Writing – original draft. XW: Data curation, Writing – original draft. HD: Formal Analysis, Writing – original draft. KC: Investigation, Resources, Writing – review & editing. FY: Methodology, Writing – review & editing.
Funding
The author(s) declared that financial support was not received for this work and/or its publication.
Conflict of interest
The authors declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declared that generative AI was not used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
1. Bray F, Laversanne M, Sung H, Ferlay J, Siegel RL, Soerjomataram I, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. (2024) 74:229–63. doi: 10.3322/caac.21834
2. Cihan BY, Koc A, and Tokmak TT. The role of radiotherapy in skull metastasis of thyroid follicular carcinoma. Klin Onkol. (2019) 32:300–2. doi: 10.14735/amko20192
3. Castellana M, Piccardo A, Virili C, Scappaticcio L, Grani G, Durante C, et al. Can ultrasound systems for risk stratification of thyroid nodules identify follicular carcinoma? Cancer Cytopathol. (2020) 128:250–9. doi: 10.1002/cncy.22235
4. Chow SM, Law SC, Au SK, Leung TW, Chan PT, Mendenhall WM, et al. Differentiated thyroid carcinoma: comparison between papillary and follicular carcinoma in a single institute. Head Neck. (2002) 24:670–7. doi: 10.1002/hed.10080
5. Zhan W, Cai X, Qi H, He H, Zhu D, Yang Y, et al. Intratumoral and peritumoral radiomics based on ultrasound for the differentiation of follicular thyroid neoplasm. Gland Surg. (2024) 13:1942–53. doi: 10.21037/gs-24-247
6. Lin Y, Lai S, Wang P, Li J, Chen Z, Wang L, et al. Performance of current ultrasound-based Malignancy risk stratification systems for thyroid nodules in patients with follicular neoplasms. Eur Radiol. (2022) 32:3617–30. doi: 10.1007/s00330-021-08450-3
7. Li J, Li C, Zhou X, Huang J, Yang P, Cang Y, et al. US risk stratification system for follicular thyroid neoplasms. Radiology. (2023) 309:e230949. doi: 10.1148/radiol.230949
8. Yuan Y, Shu H, Li L, Wu L, and Yu F. A new scoring system for risk stratification of thyroid tumors. BMC Med Imaging. (2025) 25:114. doi: 10.1186/s12880-025-01633-0
9. Mohebbi A, Abdi A, Mohammadzadeh S, Rad MG, and Mohammadi A. Impact of needle gauge selection on sample adequacy in ultrasound-guided thyroid fine-needle aspiration: A systematic review and meta-analysis. Acad Radiol. (2025) 32:7309–19. doi: 10.1016/j.acra.2025.10.014
10. Savala R, Dey P, and Gupta N. Artificial neural network model to distinguish follicular adenoma from follicular carcinoma on fine needle aspiration of thyroid. Diagn Cytopathol. (2018) 46:244–9. doi: 10.1002/dc.23880
11. Baloch ZW, Seethala RR, Faquin WC, Papotti MG, Basolo F, Fadda G, et al. Noninvasive follicular thyroid neoplasm with papillary-like nuclear features (NIFTP): A changing paradigm in thyroid surgical pathology and implications for thyroid cytopathology. Cancer Cytopathol. (2016) 124:616–20. doi: 10.1002/cncy.21744
12. Ringel MD, Sosa JA, Baloch Z, Bischoff L, Bloom G, Brent GA, et al. 2025 American thyroid association management guidelines for adult patients with differentiated thyroid cancer. Thyroid. (2025) 35:841–985. doi: 10.1177/10507256251363120
13. Seo JK, Kim YJ, Kim KG, Shin I, Shin JH, and Kwak JY. Differentiation of the follicular neoplasm on the gray-scale US by image selection subsampling along with the marginal outline using convolutional neural network. BioMed Res Int. (2017) 2017:3098293. doi: 10.1155/2017/3098293
14. Lin AC, Liu Z, Lee J, Ranvier GF, Taye A, Owen R, et al. Generating a multimodal artificial intelligence model to differentiate benign and Malignant follicular neoplasms of the thyroid: A proof-of-concept study. Surgery. (2024) 175:121–7. doi: 10.1016/j.surg.2023.06.053
15. Zheng Y, Zhang Y, Lu K, Wang J, Li L, Xu D, et al. Diagnostic value of an interpreta ble machine learning model based on clinical ultrasound features for follicular thyroid carcinoma. Quant Imaging Med Surg. (2024) 14:6311–24. doi: 10.21037/qims-24-601
16. Baloch ZW, Asa SL, Barletta JA, Ghossein RA, Juhlin CC, Jung CK, et al. Overview of the 2022 WHO classification of thyroid neoplasms. Endocr Pathol. (2022) 33:27–63. doi: 10.1007/s12022-022-09707-3
17. Zylka A, Dobruch-Sobczak K, Piotrzkowska-Wroblewska H, Jedrzejczyk M, Goralski P, Galczynski J, et al. Ultrasound and cytopathological characteristics of thyroid tumours of uncertain Malignant potential - from diagnosis to treatment. Endokrynol Pol. (2024) 75:170–8. doi: 10.5603/ep.98488
18. Jensen CB, Saucke MC, Francis DO, Voils CI, and Pitt SC. From overdiagnosis to overtreatment of low-risk thyroid cancer: A thematic analysis of attitudes and beliefs of endocrinologists, surgeons, and patients. Thyroid. (2020) 30:696–703. doi: 10.1089/thy.2019.0587
19. van Griethuysen JJM, Fedorov A, Parmar C, Hosny A, Aucoin N, Narayan V, et al. Computational radiomics system to decode the radiographic phenotype. Cancer Res. (2017) 77:e104–e7. doi: 10.1158/0008-5472.CAN-17-0339
21. Kakudo K, Bai Y, Liu Z, Li Y, Ito Y, and Ozaki T. Classification of thyroid follicular cell tumors: with special reference to borderline lesions. Endocr J. (2012) 59:1–12. doi: 10.1507/endocrj.EJ11-0184
22. Deng Y, Zeng Q, Zhao Y, Hu Z, Zhan C, Guo L, et al. Model based on ultrasound radiomics and machine learning to preoperative differentiation of follicular thyroid neoplasm. J Ultrasound Med. (2025) 44:567–79. doi: 10.1002/jum.16620
23. Zhang F, Mei F, Chen W, and Zhang Y. Role of ultrasound and ultrasound-based prediction model in differentiating follicular thyroid carcinoma from follicular thyroid adenoma. J Ultrasound Med. (2024) 43:1389–99. doi: 10.1002/jum.16461
24. Fu Y, Mei F, Shi L, Ma Y, Liang H, Huang L, et al. Intra- and peritumoral radiomics based on ultrasound images for preoperative differentiation of follicular thyroid adenoma, carcinoma, and follicular tumor with uncertain Malignant potential. Ultrasound Med Biol. (2025) 51:1217–26. doi: 10.1016/j.ultrasmedbio.2025.04.005
25. Ren JY, Lv WZ, Wang L, Zhang W, Ma YY, Huang YZ, et al. Dual-modal radiomics nomogram based on contrast-enhanced ultrasound to improve differential diagnostic accuracy and reduce unnecessary biopsy rate in ACR TI-RADS 4–5 thyroid nodules. Cancer Imaging. (2024) 24:17. doi: 10.1186/s40644-024-00661-3
26. Thomas AM, Lin AC, Deng G, Xu Y, Ranvier GF, Taye A, et al. A proof-of-concept investigation into predicting follicular carcinoma on ultrasound using topological data analysis and radiomics. . Imaging. (2025) 17. doi: 10.1556/1647.2025.00256
27. Zhu X, Li J, Li H, Wang K, Zhang J, Meng J, et al. Intranodular and perinodular ultrasound radiomics distinguishes benign and Malignant thyroid nodules: a multicenter study. Gland Surgery. (2024) 13:15. doi: 10.21037/gs-24-416
28. LiVolsi VA and Baloch ZW. Follicular-patterned tumors of the thyroid: the battle of benign vs. Malignant vs. so-called uncertain. . Endocr Pathol. (2011) 22:184–9. doi: 10.1007/s12022-011-9183-6
29. Sun R, Limkin EJ, Dercle L, Reuze S, Zacharaki EI, Chargari C, et al. Computational medical imaging (radiomics) and potential for immuno-oncology. Cancer Radiother. (2017) 21:648–54. doi: 10.1055/s-0037-1604115
30. Zhao Q, Guo S, Zhang Y, Zhou J, and Zhou P. Multimodal ultrasound radiomics model combined with clinical model for differentiating follicular thyroid adenoma from carcinoma. BMC Med Imaging. (2025) 25:152. doi: 10.1186/s12880-025-01685-2
31. Zhang XY, Zhang D, Zhou W, Wang ZY, Zhang CX, Li J, et al. Predicting lymph node metastasis in papillary thyroid carcinoma: radiomics using two types of ultrasound elastography. Cancer Imaging. (2025) 25:13. doi: 10.1186/s40644-025-00832-w
Keywords: follicular thyroid tumors, machine learning, peritumoral radiomics, risk stratification, ultrasound
Citation: Yuan Y, Wang X, Deng H, Cao K and Yu F (2025) Ultrasound radiomics-based machine learning models for risk stratification of follicular thyroid tumors. Front. Oncol. 15:1707586. doi: 10.3389/fonc.2025.1707586
Received: 17 September 2025; Accepted: 26 November 2025; Revised: 22 November 2025;
Published: 05 January 2026.
Edited by:
Hsiang-Chen Wang, National Chung Cheng University, TaiwanReviewed by:
Ziye Yan, Zhejiang Normal University, ChinaJiahui Chen, Thyroid Head and Neck Cancer Foundation, United States
Andrew M. Thomas, The University of Iowa, United States
Copyright © 2026 Yuan, Wang, Deng, Cao and Yu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Fei Yu, eXVmZWl0b3VnYW9AMTYzLmNvbQ==
Xinyue Wang1