Explore the Diagnostic Efficiency of Chinese Thyroid Imaging Reporting and Data Systems by Comparing With the Other Four Systems (ACR TI-RADS, Kwak-TIRADS, KSThR-TIRADS, and EU-TIRADS): A Single-Center Study

Purpose To explore the characteristics of C-TIRADS by comparing it with ACR-TIRADS, Kwak-TIRADS, KSThR-TIRADS and EU-TIRADS. Methods A total of 1096 nodules were collected from 884 patients undergoing thyroidectomy in our center between May 2018 and December 2020. Divided the nodules into two groups: “>10mm” and “≤10mm”. Ultrasound characteristics of each nodule were observed and recorded by 2 doctors, then classified based on ACR-TIRADS, Kwak-TIRADS, KSThR-TIRADS, EU-TIRADS, and C-TIRADS. Results A total of 682 benign nodules cases (62.23%) and 414 malignant nodules cases (37.77%) were identified. The ICC value of each guideline was:0.937(ACR-TIRADS), 0.858(EU-IRADS), 0.811(Kwak-TIRADS), 0.835(KTA/KSThR-TIRADS) and 0.854(C-TIRADS). The nodule malignancy rates in the groups(Kwak-TIRADS 4B, C-TIRADS 4B、4C) of two sizes were significantly different (all p<0.05). There was no statistical difference in the other grades of two sizes (all p>0.05). Unnecessary biopsy rates were the lowest in C-TIRADS (49.02% p<0.001). Furthermore, Kwak-TIRADS had the highest sensitivity and NPV (89.9%, 91.0%, all p<0.05), while C-TIRADS had the highest specificity and PPV (82.3%, 69.2%, all p<0.05). C-TIRADS and Kwak-TIRADS had the highest accuracy (76.0%, 72.5%, P=0.071). The AUCs of the 5 guidelines were C-TIRADS(0.816, P<0.05), Kwak-TIRADS(0.789, P<0.05) KTA/KSThR-TIRADS and ACR-TIRADS(0.773, 0.763, P=0.305), EU-TIRADS(0.734, P<0.05). The AUCs of the five guidelines were not statistically different between “nodules>10mm” and “nodules ≤ 10mm” (all P>0.05). Conclusions All five guides showed excellent interobserver agreement. C-TIRADS was slightly efficient than Kwak-IRADS, KTA/KSThR-TIRADS and ACR-TIRADS, and had greater advantages than EU-TIRADS. The diagnostic abilities of the five guidelines for “nodules ≤ 10mm” were not inferior to that of “nodules> 10mm”. C-TIRADS is simple and easy to implement and can provide effective thyroid tumor risk stratification for thyroid nodule diagnosis, especially in China.


INTRODUCTION
Thyroid nodule is the most common thyroid gland disease. Thyroid and malignant nodules can be detected in over 50% and 7-15% of the general population, respectively (1). Ultrasound is the most commonly used and effective imaging method for thyroid nodule diagnosis. Ultrasound can show the morphological characteristics of thyroid nodules clearly, including nodules ≤10mm. The number of thyroid nodules and malignant thyroid nodules detected has been increasing yearly due to the increased use of thyroid ultrasonography. Presently, various guidelines are used to differentiate benign and malignant thyroid nodules in clinical practice and science. Previous research showed that each guideline has advantages and limitations (2)(3)(4)(5)(6)(7)(8).
The following guidelines are widely used in China : Kwak-TIRADS (Thyroid Imaging Reporting and Data System) developed by Kwak (11), and EU-TIRADS created by European Thyroid Association (2017) (ETA) (12). The above guidelines are used to detect malignant probability based on the ultrasound characteristics of thyroid nodules to guide further treatment. These guidelines indicate similar suspicious features of thyroid nodules, such as solid, hypoechoic, marked hypoechoic, irregular margin, and microcalcification. However, the specific ultrasound characteristics, counting, and grading methods are different FNA(fine needle aspiration) recommendations are also different (12).
The use of TIRADS guidelines in China is not yet uniform and can cause doubts to clinicians and patients (13). Furthermore, ultrasound doctors face some challenges during diagnosis. For instance, ACR-TIRADS assigns scores to about 18 ultrasound features, which is ineffective for diagnosis and can reduce the efficiency of diagnosis in a country with a large population. The malignancy rates of the highest grades of ACR-TIRADS and EU-TIRADS are >20% and 26%-87%, respectively, the malignant rates range corresponding to the highest grades were too large, which confusing clinicians on the treatment of thyroid nodules. One more malignant feature can make the classification reach 4C and 5 when using Kwak-TIRADS and EU-TIRADS guidelines for solid hypoechoic nodules. These TIRADS guidelines also guide FNA. However, it is not realistic to conduct FNA before determining every treatment plan in China since it is not widely developed in China.
The Superficial Organ and Vascular Ultrasound Group in the Chinese Medical Association issued a new guideline, C-TIRADS (Chinese Thyroid Imaging Reporting and Data Systems), in August 2020 to solve the above problems (13). C-TIRADS is a new counting classification method used for thyroid nodule diagnosis and guiding thyroid FNA. It takes into account both the international standards and China's national conditions. Presently, few studies have reported on C-TIRADS. This study aimed to explore the characteristics of C-TIRADS by comparing it with ACR-TIRADS, Kwak-TIRADS, KSThR-TIRADS and EU-TIRADS.

MATERIALS AND METHODS
This is a retrospective study. Informed consent was not required for this retrospective observational study.

Patient Selection
From May 2018 to December 2020,two radiologists with over 5 and 7 years of experience in thyroid ultrasound diagnosis collected ultrasound images of 2683 consecutive patients with 3524 thyroid nodules in our hospital. The patients were followed up, except for pregnant and breastfeeding women and patients with a thyroid surgery history. Finally, 884 patients (1096 nodules) who underwent thyroidectomy (complete, almost complete, or unilateral thyroidectomy), had complete clinical data such as gender and age, ultrasound features data and surgical pathological results of thyroid nodules were included in the study ( Figure 1).

Thyroid Ultrasound Examination
All ultrasound examinations were performed with Phillip iu22, epiq7 or Toshiba aplio500 devices equipped with either a 5-12 MHz or a 10 MHz linear-array transducer. Two US experts who had 5 and 7 years of experience in performing thyroid US examination explored the thyroid region of the patients, stored the complete and clear thyroid nodules ultrasound images as JPEG files, recorded the location of the nodules (right/left lobe, isthmus), The three diameters of the nodes were measured three times and the average value was recorded. The three diameters of the nodes(upper and lower diameter, left and right diameter, front and back diameter) were measured three times and the average values were recorded. Recorded the maximum diameter of the nodules. Suspicious cervical lymph node metastasis was also observed.

Nodule Analysis
The nodules without histological results were excluded after follow-up. Two US experts who had 10 and 15 years of experience in performing thyroid US examination and did not know the final histology of nodules retrospectively reviewed the images and independently analyzed all nodules. Only the nodules with clear pathological diagnoses were included when a patient had more than one nodule. Nodule composition (cystic, almost completely cystic, spongiform, mixed cystic and solid, solid or almost completely solid), echogenicity (no echo, hyperechoic, isoecho, hypoechoic, markedly hypoechoic), shape (wider-than-tall, taller-than-wide), Margin (well circumscribed, microlobulated or irregular, ill-defined, extra thyroid extension), and hyperechoic (microcalcifications, peripheral calcification, macrocalcifications, comet-tail sign) were recorded. The nodules were then classified based on Kwak-TIRADS (9), KSThR-TIRADS (10), ACR-TIRADS (11), EU-TIRADS (12), and C-TIRADS (2) guidelines. The results were compared, and the two doctors discussed and settled on a final result whenever there was a disagreement.

Statistics
IBM SPSS Statistics (version 22) and R-Project (version 4.0.5) were used for statistical analyses. Quantitative data were presented as mean ± standard deviation (SD), while qualitative data were presented as frequencies.
The Shapiro-Wilk test was used to determine the presence of a normal distribution. Differences between groups were analyzed using a Mann-Whitney U test for nonparametric data and an unpaired t-test for parametric data. The c2 test or Fisher's exact test was used to compare categorical variables. ICC (intraclass correlation coefficient)was used to evaluate inter-observer agreement. Unnecessary biopsy rates were calculated as the proportion of benign nodules among thyroid nodules that were indicated for biopsy in the five guidelines. The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy were determined by comparing them with the pathological findings.
Kendall's tau-b test was used to assess the relationship between each category and the pathology findings. The receiver operating characteristic (ROC) curves of the four guidelines were used to calculate the best cut-off value. The DeLong test was used to compare the ROC curves via the pROC software package ("R-Project, version 4.0.5"). P < 0.05 was considered statistically significant.

Characteristics of Patients and Nodules
A total of 884 patients (681 females and 203 males; median age, 43.91 years; between 10 and 78 years) were included in the study. A total of 507 benign cases (397 females and 110 males; median age, 49.26 years; between 25and 78 years)and 377 malignant cases (284 females and 93 males; median age, 41.83 years; between 10 and 73 years) were detected. Patients with thyroid cancer were significantly younger than those with benign nodules(P<0.001). The gender difference was not significant (P=0.299). A total of 1096 thyroid nodules (average maximum diameter, 18.86 mm, between 5 and 64mm) were identified. There were 682 benign nodules (average maximum diameter, 19.13mm between 5 and 64mm)and 414 malignant nodules (average maximum diameter, 17.76mm between 5 and 60mm). The malignant nodules were smaller than the benign nodules (P=0.043).The pathological results of the nodules are shown in Table 1.

The Relationship Between the Classification of the Five Guidelines and Nodule Pathology
The incidence of malignancies of different grades in the five guidelines is shown in Table 2. The calculated malignancy rates of the following levels were higher than the recommended Each guideline was divided into groups according to whether the nodule was greater or smaller than 10mm. The number of nodules and the incidence of malignant tumors in each group were calculated ( Table 2). The nodule malignancy rates in the grades(Kwak-TIRADS 4B,C-TIRADS 4B,C-TIRADS 4C) of two sizes were significantly different (all p<0.05). And the malignancy rates of Kwak-TIRADS 4B were higher in "nodules ≤ 10mm" than in "nodules >10mm". The malignancy rates of C-TIRADS 4B and C-TIRADS 4C were lower in "nodules ≤ 10mm" than in "nodules >10mm". There was no statistical difference in the malignant rate of nodules in the other grades between "nodules ≤ 10mm" and " nodules >10mm" (all P>0.05).

Unnecessary Biopsy Rate
Supplementary Material 2 shows statistics on the FNA recommended by each guideline. If FNA was performed on each nodule that met the FNA indication in this study, their

Diagnostic Performance of the Five Guidelines
The diagnostic efficacy and the ROC curves of the five guidelines are shown in Table 3, respectively. ROC analysis showed that the best diagnostic cut-off values of ACR-TIRADS, Kwak The diagnostic efficacy of the five guidelines for nodules with different sizes is shown in Table 4. The AUCs of the five guidelines were not statistically different between "nodules ≤ 10mm" and "nodules >10mm" (all P>0.05).

DISCUSSION
Most guidelines currently provide guidance for FNA, but FNA is not widely available in China, and it is not realistic to mandate FNA for every thyroid nodule before deciding on a treatment plan. Therefore, C-TIRADS guideline points out that in medical institutions that have not yet carried out FNA, the results of C-TIRADS may provide some suggestions for surgeons' treatment decisions (13). Furthermore, Kwak-TIRADS and C-TIRADS were very similar, and both had the simplest counting method. However, C-TIRADS was different from other risk stratifications since it removed the "hypoechoic" and "mainly solid" characteristics from the malignant signs, indicating the "comet tail sign" as a benign sign and calculating it as -1 point. The Kwak-TIRADS and C-TIRADS grades of the same nodule were then compared. A total of 462 nodules were degraded due to echo and composition. For the number of high-grade nodules, a total of 393 high-grade nodules of C-TIRADS (C-TIRADS 4C+C-TIRADS 5) were identified, which was lower than EU-TIRADS 5 (669cases), Kwak-TIRADS (631 cases) (Kwak-TIRADS4C+Kwak-TIRADS 5), KTA/KSThR-TIRADS 5 (565cases) and ACR-TIRADS TR5 (507 cases). The unnecessary puncture rate of C-TIRADS was lowest (49.02%, all p<0.001). Therefore, C-TIRADS can reduce the grade of nodules without affecting the puncture standards.
The benign and malignant nodules had different sizes, and the maximum diameter of benign nodules was larger than that of malignant nodules, consistent with Gao's study (14). The difference could be due to selection bias since most benign patients undergo surgery due to oppressive symptoms and aesthetic needs. The malignancy rate of nodules in each guideline increased with the increase in the grade, indicating a correlation with guidelines. The malignancy rates of most grades in the 5 guidelines were within the range of malignancy rates recommended by each guideline, which shows that the sample in this study was representative. ACR-TIRADS TR3 (11.48%),ACR-TIRADS TR4 (29.24%), EU-TIRADS3 (4.25%), EU-TIRADS4 (22.86%), C-TIRADS4A (16.77%) had higher malignancy rates than the recommended range, but some were comparable to the malignancy rates reported in previous studies (14)(15)(16). The difference could be due to the deviation caused by several malignant nodules or subcentimeter nodules and different observers.
The prerequisite for a guideline to be widely used is that it has good consistency among doctors. In our study, the two doctors used the five guidelines and showed consistent results, indicating that the five guidelines can be used in a standardized manner. Regarding US features, the inter-observer agreement was slightly worse for hyperechoic (ICC, 0.726), echogenicity (ICC, 0.758) and margin (ICC, 0.799) relative to other features. The study by  Park et al. (17) concluded that the consistency of echogenicity was poor. In a multi-center study by Persichetti et al. (18) and a single-center study by Giorgio et al. (19), hyperechoic and margin were also US features with poor interobserver agreement. A uniform lexicon of thyroid US features, simplified classification methods, and specialized training to describe thyroid US findings may improve observers' agreement.
ROC was used to analyze the diagnostic performance of the four guidelines. First, the diagnostic cut-offs of the four guidelines, ACR-TIRADS TR5, Kwak-TIRADS 4C, KTA/ KSThR-TIRADS 5, and EU-TIRADS 5 were identified, which was similar to that of Gao, Ali Murat Koc (14,20). However, Simone indicated that the diagnostic cut-off value of ACR-TIRADS is TR4, while Du showed that the diagnostic cut-off value of Kwak-TIRADS is 4B (21,22). They also showed that the malignancy rate of the two nodule grades is very high, possibly due to the deviations in the data source. The diagnostic cut-off value of C-TIRADS was 4C. ACR-TIRADS showed the highest specificity compared to the other three guidelines (except C-TIRADS), similar to previous findings (14,23,24). The above articles all compared multiple guidelines, including ACR-TIRADS and Kwak-TIRADS. This study shows that Kwak-TIRADS had the highest sensitivity. Hu et al. (25) was similar to this study. C-TIRADS had the highest specificity(82.3%),PPV (69.2%) and accuracy (76.0%). The sensitivity of C-TIRADS (75.7%) was low, the same as Zhu et al., but it was still better than ACR-TIRADS (73.9%) (26). Besides, C-TIRADS had the highest AUC (0.816, all P<0.05). Therefore, C-TIRADS has the highest diagnostic performance under the premise that each diagnostic index had no obvious shortcomings.
This study included 19.1% sub-centimeter nodules. Most guidelines recommend using an active monitoring strategy instead of surgical treatment for low-risk sub-centimeter nodule treatment. However, in China, some patients with subcentimeter nodules (such as suspicious cervical lymph nodes, other thyroid symptoms, no active follow-up, or hope for more radical treatments)choose surgery. Furthermore, most low-grade sub-centimeter nodules were obtained with malignant nodules when the thyroid lobes were removed. However, these treatment options were controversial. Presently, few studies have reported on various diagnostic properties of sub-centimeter thyroid nodules.
Except for Kwak-TIRADS 4B, C-TIRADS 4B, and 4C, the malignancy rates were no statistical difference between "nodules ≤ 10mm" and " nodules <10mm" in the other grades (all P>0.05). The AUCs of the five guidelines were not statistically different between the two sizes (all P>0.05). Many studies have shown that the malignancy rate of high-grade small-size nodules is lower than that of large-size nodules. Studies have also shown that the guidelines have better diagnostic efficiency in identifying "nodules <10mm" than "nodules ≤10mm" (12,14). However, some studies have shown that the incidence of malignant tumors increases with the number of suspicious features, regardless of the size of the nodules (27,28). Some studies have shown that papillary thyroid microcarcinomas (PTMCs) account for 59.7% of malignant nodules and increase during follow-up (13,(29)(30)(31). At present, although the diagnostic ability of the guidelines is controversial for sub-centimeter nodules, it is clear that in our study, the diagnostic ability of the 5 guidelines for "nodules ≤ 10mm" is not inferior to "nodules <10mm". This research also had some limitations. First, all patients underwent thyroidectomy, increasing the proportion of malignant nodules, decreasing the number of low-grade nodules, thus increased the number of high-grade nodules. This can cause selection bias, affecting the diagnostic efficacies of the guidelines and reducing the consistencies of diagnoses. Second, clinicians retrospectively analyzed all nodes based on static images only. Static images will affect the evaluation of ultrasonic features, especially the margin of nodules. Real-time dynamic images can evaluate ultrasonic features more accurately. Finally, this was a single-center retrospective study, with guaranteed consistencies of nodule diagnosis results. But, the heterogeneity of the patient population was smaller than that of the multi-center study.

CONCLUSIONS
All five guides showed excellent inter-observer agreement. C-TIRADS was slightly efficient than Kwak-IRADS, KTA/ KSThR-TIRADS and ACR-TIRADS, and had greater advantages than EU-TIRADS. The diagnostic abilities of the five guidelines for "nodules ≤ 10mm" were not inferior to that of "nodules> 10mm". C-TIRADS is simple and easy to implement and can provide effective thyroid tumor risk stratification for thyroid nodule diagnosis, especially in China.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
AZ and PX collected and classified the thyroid nodules. SG, SC and YL followed up the thyroid nodules. QQ and XH compiled and analyzed the data. QQ and AZ wrote the paper. All authors contributed to the article and approved the submitted version.