Comparison of Different Risk-Stratification Systems for the Diagnosis of Benign and Malignant Thyroid Nodules

Introduction: To compare the efficacy of four different ultrasound-based risk-stratification systems in assessing the malignancy risk of thyroid nodules in the Chinese population. Methods: We retrospectively reviewed the digital ultrasound images of 1,568 patients (1,612 thyroid nodules) who underwent surgery in our hospital between January 2012 and December 2017. All thyroid nodules were pathologically identified as malignant or benign. We evaluated the following ultrasound characteristics: size, location, composition, echogenicity, shape, margins, calcification or echogenic foci, and extrathyroidal extension. Each nodule was categorized using four risk-stratification systems: the American Thyroid Association (ATA) classification, the Thyroid Imaging, Reporting, and Data System (TIRADS) of the American College of Radiology (ACR-TIRADS), the European Thyroid Association TIRADS (EU-TIRADS), and the TIRADS developed by Kwak et al. (Kwak-TIRADS). The diagnostic performance of each risk-stratification system relative to the pathological results was analyzed. We used receiver operating characteristic curves to identify cutoff values that yielded optimal sensitivity (SEN), specificity (SPE), positive predictive value (PPV), negative predictive value (NPV), and accuracy (ACC). Results: Of the 1,612 nodules, 839 (52.0%) were benign, and 773 (48.0%) were malignant. The AUCs of the ACR-TIRADS, EU-TIRADS, Kwak-TIRADS, and ATA classification were 0.879, 0.872, 0.896, and 0.869, respectively. The Kwak-TIRADS had the best SEN, NPV, ACC, and AUC, while the ACR-TIRADS had the best SPE and PPV. Conclusion: All four risk-stratification systems had good diagnostic performances (AUCs > 86%). Considering its high SEN, NPV, ACC, and AUC, we believe that the Kwak-TIRADS may be the more effective risk-stratification system in the Chinese population.


INTRODUCTION
Thyroid nodules are very common with ultrasound detection rates of 50-60% (1). However, the detection rates of malignant nodules are significantly lower at 5-15% (2). Ultrasonography is the primary modality used for imaging thyroid nodules, as it is readily accessible, noninvasive, and cost-effective (3). Ultrasound-guided fine-needle aspiration cytology (US-FNAC) is the most effective and practical technique to determine whether a thyroid nodule is malignant or whether surgery is required to establish a definitive diagnosis (4). Due to the complex imaging features of thyroid nodules, several distinct risk-stratification systems have been developed to standardize the diagnostic procedure. The risk-stratification systems commonly used to classify thyroid nodules are (1)  More than 90% of thyroid carcinomas are well-differentiated papillary thyroid carcinomas, which are associated with a low malignancy potential, good prognosis, and excellent 5-year survival rates of 95-97% (5,6). Therefore, early diagnosis is particularly important in the treatment of thyroid nodules. Due to several advancements in ultrasound technology, such as the development of elastography and contrast-enhanced ultrasonography, the diagnostic accuracy of ultrasonography in patients with thyroid nodules is increasing (7,8). However, conventional ultrasonography remains the most widely employed diagnostic tool for detecting for thyroid nodules due to its wide availability. The purpose of this study was to compare the four risk-stratification systems used in our research center, namely, the ATA classification, the ACR-TIRADS, the EU-TIRADS, and the Kwak-TIRADS, in terms of their efficacy in determining the malignancy risk of thyroid nodules. Our findings will help provide a theoretical basis for the selection of the optimal risk-stratification system.

Ethics and Consent
This retrospective study was approved by the institutional review board of our hospital. Informed consent was waived for this retrospective review.

Patients
This study involved all patients with thyroid nodules who underwent surgery in our hospital between January 2012 and December 2017. Patients were eligible for inclusion in this study if they were between 18 and 80 years of age with nodules measuring more than 5 mm in diameter, as nodules measuring <5 mm have no clinical significance (5). US-FNAC was introduced in our hospital in February 2015. Thus, the indication of thyroid surgery was based on the US-FNAC findings after February 2015. Prior to this time, surgery was considered to be indicated for nodules that showed at least two ultrasound features that were highly suggestive of malignancy and for nodules that appeared to be benign but were associated with clinical symptoms.
Patients with histories of invasive procedures, such as ablation or FNA, those without complete ultrasonographic data, and those with any mismatch between the ultrasound images and the pathological results were excluded from this study.

Conventional Ultrasonography
Real-time ultrasound examinations were performed using the iU22 device (Philips Medical Systems, Bothell, WA, USA; 5-12 MHz linear probe) or the S3000 device (Siemens Medical Solutions, Mountain View, CA, USA; 5-14 MHz linear probe) by five radiologists with more than 7 years of experience each in thyroid ultrasonography. Ultrasonography was performed with the patient in a supine position and the neck slightly extended. The probe was placed on the surface of the neck with slight pressure. The entire thyroid gland was scanned first to determine the echo structure of the thyroid parenchyma. When nodules were detected, they were placed in the center of the screen for analysis. Machine settings such as gain, depth, focus, and dynamic range were adjusted as necessary to achieve high-quality ultrasonographic images. The ultrasound data were recorded and stored for further analysis.

Image Evaluation
Two radiologists (YS and ML) who did not participate in the image capture independently reviewed, analyzed, and classified the imaging data. They have 11 and 15 years of experience in thyroid ultrasonography, respectively. They were blinded to the patients' medical information, including previous imaging and pathological results. A basic consensus on the lexicon for the four guidelines had previously been reached, and included imaging characteristics such as location, composition, echogenicity, shape, margin, calcification or echogenic foci, and neck lymph nodes (3,(9)(10)(11).
The locations were divided into right, left, and isthmus. Specific descriptions were used to ensure consistency between the surgical and pathological nodules. Each nodule was described as being located in the upper, middle, or lower third of the thyroid gland, and close to the anterior capsule, in the middle of the thyroid gland, or close to the posterior capsule. The composition was described as cystic or almost completely cystic, spongiform, mixed cystic and solid, or solid or almost completely solid. Echogenicity was determined relative to the surrounding glands and was described as anechoic, hyperechoic, isoechoic, hypoechoic, or very hypoechoic (lower echogenicity than that of the adjacent strap muscle). The shape was classified as widerthan-tall and taller-than-wide. Margins were classified as smooth, ill-defined, lobulated or irregular, or extrathyroidal extension. Calcification or echogenic foci were classified as none or large comet-tail artifacts (V-shaped, >1 mm, cystic components), macrocalcification (>1 mm), peripheral (rim) calcification, or punctate echogenic foci or microcalcification. The lymph node status was defined as normal or metastatic.

Statistical Analysis
The SPSS software (version 19.0, SPSS Inc., Chicago, IL, USA) and MedCalc software (version 15.8, Mariakerke, Belgium) were used for statistical analysis. Continuous variables were expressed as mean ± standard deviation or as ranges. Classification data were compared using the chi-square test or the Fisher exact test, while continuous variables were compared using the independent-samples t-test. The receiver-operating characteristic (ROC) curve was used to comparatively analyze the diagnostic value of the four guidelines. The areas under the curve (AUCs) of the diagnostic ability of the four riskstratification systems were calculated, and the Cochran Q-test and z-test were used for statistical analysis. The best cutoff values were obtained from the ROC analyses, and the corresponding sensitivity (SEN), specificity (SPE), positive predictive value (PPV), negative predictive value (NPV), and accuracy (ACC) were calculated. Two-sided P < 0.05 were considered to indicate statistical significance.

General Characteristics
During the study period, a total of 1,634 patients with 1,687 thyroid nodules underwent surgery in our hospital. The indication of surgery was based on the US-FNAC findings in the case of 757 nodules. After the application of the selection criteria, 1,568 patients with 1,612 nodules were enrolled into this study. Of these patients, 1,156 were women (1,192 nodules) with a mean age of 51 ± 12 years (range, 18-80 years), and 412 were men (420 nodules) with a mean age of 49 ± 11 years (range, 18-78 years). Of the 1,612 thyroid nodules, 839 (52.05%) were diagnosed as benign on pathological examination (nodular goiter, 525; adenoma, 213; Hashimoto thyroiditis, 73; and subacute thyroiditis, 28). The other 773 (47.95%) nodules were diagnosed as malignant on pathological examination (papillary carcinoma, 738; follicular carcinoma, 23; medullary carcinoma, 10; and undifferentiated carcinoma, 2).

Ultrasonographic Predictors of Malignancy
The malignant nodules were significantly smaller than the benign nodules (13.58 ± 11.00 mm vs. 19.69 ± 11.57 mm; P < 0.001). In addition, patients with malignant nodules were significantly younger than those with benign nodules (48 ± 13 years vs. 53 ± 12 years; P < 0.001). No significant sex-related differences were observed between patients with benign and malignant nodules, with similar female-to-male ratios in the benign (2.94, 626/213) and malignant (2.73, 566/207) groups (P = 0.479). Compared to the benign nodules, the malignant nodules were significantly more likely to have a solid or mostly solid composition, hypoechogenicity or very hypoechogenicity, taller-than-wide shape, lobulated or irregular margins, extrathyroidal extension, microcalcifications, and lymph node metastasis (P < 0.05 for all; Table 1 and Figure 1).

Malignancy Risk Stratification
The risk of malignancy significantly differed among the four riskstratification systems (P < 0.05; Table 2). The malignancy risk was within the recommended range in the case of all guidelines, except for the ATA classification, in which case the risk was too low.

Diagnostic Cutoffs
The cutoff value of the ACR-TIRADS was TIRADS 5, whose SEN, SPE, PPV, NPV, ACC, and AUC were 88.  Table 3 and Figure 2). The Cochran Q test revealed differences among the four systems (Cochran Q = 150.29, P < 0.01). The AUC of the Kwak-TIRADS significantly differed from those of the other three systems (z-values: 3.405 for ACR, 5.748 for ATA, and 5.485 for EU; P < 0.01 for each). No significant differences were detected among the other three risk-stratification systems (Supplementary Table S1).

Diagnostic Efficacy According to Nodule Size
We divided the nodules into three groups based on their diameter: ≤10 mm, >10 mm but ≤20 mm, and >20 mm. For nodules with diameters of ≤10 mm, the ACR-TIRADS had the greatest AUC. For nodules with diameters >10 mm, the Kwak-TIRADS had the greatest AUC. We found that the diagnostic efficacy of the four guidelines varied with nodule size and that the efficacy was higher for larger nodules ( Table 4).

DISCUSSION
Ultrasonography is currently the preferred method for evaluating thyroid nodules (2). Ultrasound features such as hypoechoic or very hypoechoic, taller-than-wide, microcalcifications, and irregular margins are associated with malignancy (12). However, single images are unreliable in predicting the malignancy risk of thyroid nodules (13). Therefore, researchers have developed ultrasound models that combine several ultrasound features in order to improve the diagnostic performance of ultrasonography for thyroid nodules.
In recent years, many distinct TIRADS guidelines have applied ultrasound features to classify thyroid nodules as malignant or benign, or to recommend US-FNA (3,9,11,(14)(15)(16)(17)(18)(19)(20)(21). Such diagnostic standards not only clarify the malignancy risk of thyroid nodules but also help guide treatment. Many versions of TIRADS were modeled on the BIRADS, which has been widely used in breast cancer diagnoses. For instance, the Kwak-TIRADS, a simplified classification based on five malignant features, has proved to be clinically useful and accessible (22). However, independent risk factors are not weighted in the Kwak-TIRADS guidelines. For example, microcalcification, which carries a higher malignancy risk than solid consistency or hypoechoic appearance, has been deemed an equal indicator of malignancy risk. In addition, extrathyroidal extension, an important risk factor, has not been included. Despite concerted efforts, no TIRADS classification has been widely accepted, especially in the United States. In 2015 and 2017, two versions of the ACR-TIRADS white paper were published (11,23). The difference between them was that in the 2017 version, nodule size was a criterion for US-FNA but not for malignancy risk stratification. The ACR-TIRADS is suitable for all nodules, as it integrates all ultrasonographic characteristics, which are scored from 0 to 3 based on their malignant potential. The higher the score, the higher the malignancy risk. Therefore, the ACR-TIRADS is an objective and comprehensive method to evaluate the characteristics of each thyroid nodule and also to guide therapy. The disadvantage is that it is more complicated than the other guidelines. Moreover, malignant nodules with mixed echo patterns are scored lower in the ACR-TIRADS, resulting in misdiagnosis. The risk of malignancy for ACR-TIRADS 5 is ≥20%. The EU-TIRADS is based on a review of the literature and on the American Association of Clinical Endocrinologists, the ATA, and Korean guidelines (3). The EU-TIRADS is similar to the ACR-TIRADS but is simpler. The following  characteristics indicate a high risk of malignancy under the EU-TIRADS guidelines: irregular shape, irregular margins, microcalcifications, and marked hypoechogenicity. This system classifies mildly hypoechoic nodules into four categories. For category-4 nodules that measure >1.5 cm, the EU-TIRADS recommends FNA, which is an excellent recommendation for thyroid adenomas and adenocarcinomas. However, the EU-TIRADS does not include solidity as an independent risk factor and only considers hypoechogenicity. The ATA guidelines were first published in 2009 and revised in 2015 (13). The ultrasonic signature of increased nodular vascularity was removed from the 2015 guidelines. In addition, the risk associated with hypoechogenicity was reduced. The ATA guidelines clearly identified three characteristics that are highly indicative of malignant nodules (median, >90%): microcalcifications, irregular edges, and taller-than-wide shape. These guidelines directly push forward the concept of risk stratification. However, the ATA classification has been developed for differentiated thyroid cancer in adults. In this study, we applied the ATA guidelines to the two cases of undifferentiated thyroid cancers, which were classified as highly suspicious nodules. The drawback of the ATA guidelines, like the EU-TIRADS, is the use of risk stratification to classify suspicious ultrasound features of different significance into the same hierarchy, with no independent categorization of solidity as an independent risk factor.
In this study, the AUCs of the four methods were more than 86%, indicating that all of them had good diagnostic performance. Some benign nodules were misclassified as malignant nodules. In this study, 28 subacute thyroiditis nodules and 24 Hashimoto thyroiditis nodules were misidentified as malignant because of their solid composition, hypoechoic or very hypoechoic appearance, and taller-than-wide shape. However, some of these lesions may be correctly diagnosed on the basis of the clinical history, thyroid-function indicators, and results of other newer technologies such as ultrasound elastography. It should be noted that all 28 subacute thyroiditis nodules that were misidentified as malignant were examined before February 2015. After this time, similar nodules were not misdiagnosed because of the use of US-FNAC. A total of 96 benign solid hypoechoic nodules were misdiagnosed as malignant nodules because they had microcalcifications or both macro-and microcalcifications. Eight benign nodules were misdiagnosed because of their solidity, irregular margins, and mixed echogenicity.
Similarly, some malignant nodules were mislabeled as benign nodules. Small nodules, such as thyroid microcarcinomas, with diameters of 6-10 mm do not exhibit the characteristic malignant features and were mistaken for benign lesions. In addition, some malignant nodules were classified as benign due to cystic degeneration and the lack of other malignant characteristics. In this study, ultrasonography and US-FNAC helped in the selection of surgical procedures. US-FNAC could differentiate most benign and malignant nodules. Benign nodules were commonly treated using lobectomy or hemithyroidectomy. The surgical procedure for malignant nodules depended on the tumor stage and lymph node metastasis status. Ultrasonography could reveal nodule size, extraglandular invasion, and cervical lymph node metastasis. In our study, large nodules and those associated with obvious extraglandular invasion or lymphatic metastasis were treated using lobectomy or total thyroidectomy and central compartment or lateral neck dissection. Patients with invasion of the respiratory or digestive tract underwent surgery plus radioactive iodine treatment and radiotherapy. Patients with central lymph node metastasis were treated with total thyroidectomy and central compartment neck dissection. Prophylactic unilateral or bilateral central compartment neck dissection was performed for patients with advanced-stage papillary thyroid carcinoma (c3, T 4 and cN1b). Patients with smaller lesions (T 1 , T 2 ), non-invasive lesions, or cN 0 papillary thyroid carcinoma, and most patients with follicular carcinoma underwent thyroidectomy with or without prophylactic central compartment dissection. The accuracy of each classification in the diagnosis of benign and malignant nodules differed with nodule size. In our research, the thyroid nodules were divided into three groups based on diameter: ≤10 mm, >10 mm but ≤20 mm, and >20 mm. The diagnostic efficacy was higher for the larger nodules. For nodules measuring ≤10 mm, the EU-TIRADS had the highest SEN and NPV, while the ACR-TIRADS had the highest SPE, PPV, and AUC. In the other two groups, the Kwak-TIRADS had the highest SEN, NPV, and AUC, while the ACR-TIRADS had the highest SPE and PPV. All four risk-stratification systems performed well in the differential diagnosis of benign and malignant thyroid nodules.
Overall, the Kwak-TIRADS had the highest SEN, NPV, and ACC, while the ACR-TIRADS had the highest SPE and PPV. The Kwak-TIRADS significantly differed from the other three guidelines, but no significant differences were found among the other three guidelines. The Kwak-TIRADS may be the optimal classification system to differentiate between benign and malignant thyroid nodules.
There are several limitations to this study. First, selection bias was inevitable because of the retrospective study design and because patients were selected from the surgical department rather than the general population. Second, there may be inter-rater differences between the characteristics of thyroid nodules. The consistency between assessments performed by different physicians should be verified. Finally, the proportion of malignant nodules was high, and papillary thyroid carcinomas accounted for the majority of the malignant nodules, with few cases of other types of malignant nodules. Prospective studies with larger sample sizes may overcome this drawback.
In conclusion, all four risk-stratification systems provided effective stratification of malignancy risk for the diagnosis of thyroid nodules. The Kwak-TIRADS may be more suitable for the Chinese population, and is simple worthy of clinical application.

ETHICS STATEMENT
This study was carried out in accordance with the recommendations of Declaration of Helsinki with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Ethics committee of Gong Li Hospital.

AUTHOR CONTRIBUTIONS
YS and XF conceived and designed the experiments. JH, SW, MC, YW, and LG performed the experiments. YS and ML reviewed, analyzed, and classified the imaging data. XC provided basic information of all cases, and JD provided pathological results. YS wrote the paper.

ACKNOWLEDGMENTS
This study was funded by the Surface Project of Shanghai Pudong New Area Health and Family Planning Commission (grant no., PW2017A-22) and the Talent Project of Shanghai Pudong New Area Gongli Hospital (grant no., GLRq2017-02).