A Control Study on the Value of the Ultrasound Grayscale Ratio for the Differential Diagnosis of Thyroid Micropapillary Carcinoma and Micronodular Goiter in Two Medical Centers

Objective To investigate the value of ultrasound gray-scale ratio (UGSR) for the differential diagnosis of papillary thyroid microcarcinoma (PTMC) and micronodular goiter (MNG) in two medical centers. Methods Ultrasound images of 881 PTMCs from 785 patients and 744 MNGs from 687 patients in center A were retrospectively analyzed and compared with 243 PTMCs from 203 patients and 251 MNGs from 198 patients in center B. All cases were confirmed by surgery and histology. The grayscale values of thyroid lesions and surrounding normal tissues were measured, and the UGSR was calculated. The optimal UGSR threshold for identifying PTMCs and MNGs in two medical centers was determined by receiver operating characteristic (ROC) curve, and the area under the curve (AUC), optimal UGSR threshold, sensitivity, specificity, positive predictive value, negative predictive value, and accuracy were compared between the two medical centers. Results The UGSR values of PTMCs and MNGs in medical center A were 0.5537 (0.4699, 0.6515) and 0.8708 (0.7616, 1.0123) (Z = -27.691, P = 0), respectively, whereas those in medical center B were 0.5517 (0.4698, 0.6377) and 0.8539 (0.7366, 0.9929) (Z = -16.057, P = 0), respectively. The UGSR of PTMCs and MNGs did not differ significantly between the two medical centers (Z = -0.609, P = 0.543 and Z = -1.394, P = 0.163, respectively). The AUC, optimal UGSR threshold, sensitivity, specificity, positive predictive value, negative predictive value, and accuracy of the two medical centers were 0.898 vs. 0.918, 0.7214 vs. 0.6911, 0.881 vs. 0.868, 0.817 vs. 0.833, 0.851 vs. 0.834, 0.853 vs. 0.867, and 0.852 vs. 0.850, respectively. Conclusions UGSR can quantify the echo intensity of PTMCs and MNGs and is therefore valuable for the differential diagnosis of the two diseases. The diagnostic efficacy was consistent between the two medical centers. This method should be widely promoted and applied.


INTRODUCTION
In the ultrasound examination of thyroid nodules, the echogenicity of the thyroid gland and strap muscle is often used as a reference as observed by the physicians' naked eyes, and it is divided into three to five levels. This method is called Thyroid Imaging Reporting and Data System (TIRADS). The Korean-TIRADS classifies lesions into three types according to echo intensity: hypoechoic, isoechoic, and hyperechoic. The American College of Radiology (ACR) TIRADS divides lesions into four levels: very hypoechoic, hypoechoic-isoechoic, hyperechoic, and anechoic. The European-TIRADS includes five levels: markedly hypoechoic, mildly hypoechoic, isoechoic, hyperechoic, and anechoic (1)(2)(3). Although hypoechoic in the Korean-TIRADS, very hypoechoic in the ACR-TIRADS, and markedly hypoechoic in the European-TIRADS cover different ranges, these levels are considered as suspicious malignant nodules. The diagnostic performance can also show differences. Even if the same TIRADS is used, the subjective assessment can lead to diagnostic differences (4), particularly in the thyroid gland, which differs in thickness from the strap muscle, and the echo intensity can vary between different sections of the same strap muscle. Assessment of the signal intensity in nodules and strap muscles is greatly influenced by subjective factors (5). In addition, as the echo intensity of the thyroid nodule changes from weak to strong, the grayscale value changes from low to high and the image from black to white; the grayscale value is a continuous variable, theoretically, and different variables contribute to the diagnostic efficacy (5). Although a rough grading of 3-5 has been recognized by many scholars (1)(2)(3)(4)(5), the gray-scale value of the thyroid or strap muscle is not necessarily the optimal threshold for distinguishing benign and malignant nodules. Quantification of the echo intensity would facilitate the differential diagnosis of benign and malignant thyroid nodules.
The ultrasound echo intensity can be affected by factors such as gain, dynamics, operator, and type of machine among others. The diagnostic value of echo intensity obtained by direct measurement is limited; however, under the premise of a standardized operation, the echo intensity of the nodule and surrounding thyroid tissues increases or decreases simultaneously despite changes in the factors affecting each; that is, the intensities designated as "low", "equal," and "high" maintain a balance. Therefore, the echo intensity of the nodule can be indirectly quantified by measuring the ultrasound grayscale ratio (UGSR) of the nodule to the surrounding thyroid tissues, which is more objective than the naked eye assessment of levels 3-5. Including our previous studies, only four reports of the quantification of thyroid nodules using UGSR have been published to date (5)(6)(7)(8). These reports all suggest that the UGSR of malignant nodules is lower than that of solid benign nodules but higher than that of cystic benign nodules. Although the diagnostic efficacy of UGSR is significantly higher than that of the traditional three to five level method, the four published studies have certain limitations, such as the inclusion of a single medical center and the use of the same ultrasound scanner in two studies. Confirming the diagnostic value of UGSR using different scanners in different medical centers would improve the accuracy of studies for clinical application.
In this study, UGSR values and their diagnostic performance were compared between two medical centers to determine the utility and consistency of UGSR for the differential diagnosis of PTMC and MNG and to provide a potential reference for improving TIRADS.

Patient Selection
The study was performed in accordance with the Helsinki Declaration ethics guidelines and approved by the ethics committees of the two institutions. The study included 3,712 consecutive cases of thyroid nodules treated in the Affiliated Hangzhou First People's Hospital, Zhejiang University School of Medicine (center A for short) between June 2017 and June 2020 and 1,200 consecutive cases from The Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital) (center B for short) between June 2019 and June 2020. Thyroid nodules with a diameter 1.0 cm or <0.4 cm, cysticdominated nodules (where the cystic component was >50% of the nodule volume) (9, 10), Hashimoto's thyroiditis, and calcification-dominated nodules (unmeasurable due to obvious calcification) (5) were excluded. Finally, 1,472 cases and 1,625 thyroid nodules from center A and 452 cases and 494 thyroid nodules from center B that met the inclusion criteria were analyzed. There were 365 men [mean age, 52 (42-59) years] and 1,508 women [mean age, 51 (42-57) years]. Figure 1

Ultrasound Examination
Five ultrasound scanners were used in center A as follows: MyLab 70 XVG (Genova, Italy), Esaote MyLab Classic C (Genova, Italy), Esaote Mylab 90 (Genova, Italy), Mindray (Shenzhen, China), and Hitachi (Tokyo, Japan). The scanners used 5-10 MHz broadband linear array probes with a central frequency of 7.5 MHz. Two ultrasound scanners were used in center B as follows: TOSHIBA Aplio 400 (Tochigi, Japan) and GE Logiq E9 (Wauwatosa, USA). The patient's posture and content scanned were the same in the two medical centers. Patients were placed in the supine position with neck hyperextended; transverse, longitudinal and oblique sections were scanned, and the nodule data were recorded as follows: number, shape, size, calcification, internal echo, halo around the boundary, internal and peripheral blood flow, and bilateral cervical lymph nodes (5).

Image Analysis
Two radiologists with more than 10 years of experience in two centers who were blinded to the pathological results assessed selected cases from the picture archiving and communication systems independently to measure thyroid nodules, the position of the region of interest (ROI), and the size of surrounding normal thyroid tissue. The assessment was performed using the gray histogram software of the RADinfo reading system (Zhejiang RAD Information Technology Co., Ltd., China). Ultrasound transverse/longitudinal section images were acquired to measure the gray-scale values of the normal thyroid tissue and nodules. When the measured nodules had a uniform echo intensity, the ROI would be as large as possible ( Figure 2). When the echo intensity of measured nodule was uneven and mainly composed of a certain echo intensity, ROI was selected in this echo intensity area, and the area was taken as large as possible ( Figure 3). For cases with uneven echo intensity in which the ROI could not be determined as a specific type, the largest possible ROI was used for measurements ( Figure 4). In the measurement of the three types of nodules, calcifications, cystic degeneration, and a surrounding hypoechoic halo were avoided. When the ROI around the nodule in the cross-section image was not sufficient, the same section on the opposite side was selected for measurement ( Figure 5). The grayscale value of the normal thyroid tissue around the nodule was measured by selecting an ROI with the same size as that used for thyroid nodules and at the same gain level as the nodule. All measurements were performed twice, and the UGSR was calculated for each; the average of the two measurements was considered the UGSR of the nodule.

Statistical Analysis
All statistical analyses were performed using SPSS 22.0 software (SPSS Inc., Chicago, IL, USA). The Mann-Whitney test was used for comparisons between the two groups. The receiver operating characteristic (ROC) curve of UGSR for PTMCs and MNGs was plotted with sensitivity as the ordinate and specificity as the abscisic coordinate. The area under the curve (AUC) was calculated. The optimal UGSR threshold was determined by comparing the sensitivity, specificity, Youden index, positive predictive value, negative predictive value, and accuracy.

Comparison of the UGSRs Between PTMCs and MNGs in Two Centers
The UGSRs measured from ultrasound images of 881 PTMCs from 785 cases ( Figure 5) and 744 MNGs from 687 cases (Figures 2 and 3 Table  3). There were no statistically significant differences in UGSR between PTMCs and MNGs in the two centers. The Z value and P value were -0.609 and -1.394, 0.543 and 0.163, respectively ( Table 4).
3. Comparison of AUC values, optimal UGSR threshold, and diagnostic efficiency of the two medical centers The AUC (Figure 7), optimal UGSR threshold, sensitivity, specificity, positive predictive value, negative predictive value, and accuracy of center A and center B were 0.  Table 5).

DISCUSSION
Tumor morphology, microcalcifications, anteroposterior/ transverse diameter ratio, and echo intensity are important ultrasonographic features for the identification of PTMCs and MNGs (1-3). Among these features, echo intensity can be affected by the subjective assessment of an ultrasound physician (4,5). Quantification of echo intensity would improve the differential diagnosis of PTMCs and MNGs. In this study, we demonstrated that the UGSRs of PTMCs in the two medical centers were lower than those of MNGs. The AUC, optimal UGSR, and thresholds in the two medical centers were 0.898, 0.918, 0.7214, and 0.6911, respectively. Parameters such as sensitivity, specificity, positive predictive value, negative predictive value, and accuracy showed similar values, ranging from 0.81 to 0.89. This indicates that the UGSRs were highly consistent between the two medical centers, and the diagnostic efficiency was considerably higher than that of the 3-5 level method in the literature (9)(10)(11)(12)(13)(14)(15). In addition, the present results showed no statistically significant differences in the size of PTMCs and MNGs between the two medical centers, whereas the size of PTMCs in the two centers was smaller than that of MNGs. This may be attributed to the fact that the small MNGs that were excluded were mostly cystic glial nodules and did not meet the inclusion criteria.
There are currently only four studies addressing UGSR quantification of thyroid nodules. In 2015, Giorgio et al. (7) measured the grayscale values of nodules, peripheral thyroid, and neck band muscles, and calculated the nodule/peripheral thyroid, nodule/muscle, and peripheral thyroid/muscle USGRs. The authors reported that the malignant nodule/peripheral thyroid USGR is significantly lower than that of benign nodules, and the observers reached an agreement (k = 0.74). When the ratio is <0.46, the sensitivity and specificity for predicting malignant nodules are 56.7% and 72%, respectively, and the nodule/muscle UGSR is not as valuable as the nodule/peripheral thyroid UGSR for predicting malignant nodules. Although Giorgio et al. pioneered the use of UGSR to quantify the echo intensity of  nodules, their study has many limitations. First, the data were derived from a single medical center and obtained using the same ultrasound scanner. Whether the conclusions can be applied to other medical centers or ultrasound scanners needs further verification. Second, there are no studies classifying thyroid nodules according to size. The echo intensity of nodules can vary according to size (9,15). A study including nodules of different sizes decreases the practical value of the UGSR. Third, the sample size of malignant nodules was small, and nodules were not classified according to pathological subtype. Fourth, all nodules showed signs of malignancy on ultrasound, such as unclear borders, microcalcifications, and hypoechoic areas, and they were diagnosed by FNAC; therefore, most nodules showing benign signs were not included in the UGSR study. In 2018, we used UGSR for the differential diagnosis of PTMCs and MNGs in a single medical center. The results showed that the AUC, optimal UGSR threshold, sensitivity, and specificity were 0.895, 0.72, 87.0, and 80.4 (5), respectively. The same parameters were used in this study, and the results were highly consistent between the two medical centers. In the same year, we performed a comparative study of anechoic MNGs and very hypoechoic PTMCs. The results showed that the UGSR of anechoic MNGs was lower, the optimal threshold was 0.26, and the sensitivity and specificity for predicting anechoic MNGs were 94.3% and 99.0%, respectively (6). In 2019, Chen et al. (8) divided papillary carcinomas and nodular goiters into three groups according to size as follows: 0.3-1.0, 1.0-1.5, and 1.5-2.0 cm. The results showed that the AUC, optimal UGSR threshold, sensitivity, and specificity vary with the size of the tumor. An increase in tumor size is associated with a decrease in the AUC and sensitivity and an increase in the optimal UGSR threshold and specificity. The AUC and optimal UGSR threshold of the 0.3-1.0 cm group were highly consistent with that of the present study at 0.919 and 0.692, respectively. Although the specificity (0.724) was lower than that of the present study, the sensitivity (0.975) was higher than that of this study, and the diagnostic performance was similar, which largely supports our results. The limitation in the data from our previous studies (5,6) and those from the study by Chen et al. (8) is that they were obtained in a single medical center, and the data reported by Chen et al. were obtained using two different scanners from the same ultrasound machine manufacturer. Compared with the studies of Giorgio et al. (7) and Chen et al. (8), the present study has the following three advantages. First, the study included two medical centers, as well as different ultrasound scanners, operators, and cases between the two   Most of the MNGs in this study were confirmed by surgery due to combined with malignant nodules or larger benign nodules, without screening, thus the UGSR was more representative of benign nodules. The present results indicated that UGSR was valuable for the differential diagnosis of PTMCs and MNGs, and was stabilized. The present study had several limitations. First, for normal thyroid tissue with uneven echo intensity or nodules with uneven echo intensity related to technical factors, the ROI was selected and measured differently. In this study, the data of the two centers were measured twice by a senior imaging physician in each center, and the mean values were calculated, which could reduce the difference in the ROI to a large extent. In addition, it is also an important purpose of our study to arouse the operator's awareness of standardized scanning. Second, there are no ultrasound signs that completely distinguish benign from malignant thyroid nodules (16,17). The two types of nodules are differentiated using a combination of signs. However, the aim of this study was to provide information on UGSR for clinicians and imaging physicians. The combination of UGSR with other ultrasound signs for identifying benign and malignant thyroid nodules will be the direction of our future research. Third, the diameters of PTMCs and MNGs in our study were 0.4-1.0 cm, and the maximum diagnostic efficacy were achieved when the UGSR were 0.7214 and 0.6911, respectively. The effect of diameters <0.4 or >1.0 cm on UGSR remains to be determined. Finally, this study was a retrospective analysis of two centers, and additional prospective studies in medical centers should be performed to verify the value and stability of UGSR.
In conclusion, UGSR was of great value for the differential diagnosis of PTMCs and MNGs by quantifying the echo intensity of the two lesions. The diagnostic performance of the two medical centers were consistent and the method is simple to apply, providing an important reference for improving TIRADS.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material; further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the ethics committees, Affiliated Hangzhou First People's Hospital, Zhejiang University School of Medicine and The Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital). Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

AUTHOR CONTRIBUTIONS
DX and ZH conceived and designed of the study. NF, YL, ML, JY, QZ, and ZL acquired the data. PW and YL analyzed and/or interpreted of the data. ZH and YL drafted the manuscript. ZH, YL, and DX revised the manuscript critically for important