Thyroid Imaging Reporting and Data System for Detecting Diffuse Thyroid Disease on Ultrasonography: A Single-Center Study

Objective: This study aimed to compare the ultrasonography (US) features of diffuse thyroid disease (DTD) and normal thyroid parenchyma (NTP), and to propose a structured imaging reporting system for detecting DTD. Methods: This retrospective study assessed the findings for 270 consecutive patients who underwent thyroid US before thyroid surgery. The following US data were analyzed: DTD-specific features, parenchymal echotexture and echogenicity, anteroposterior diameter, glandular margin, and parenchymal vascularity. Univariate and multivariate analyses with generalized estimating equations were performed to investigate the relationship between US features and DTD. The fitted probability of DTD was analyzed by using a regression equation. Results: Of the 270 patients, there were NTP (n = 193), Hashimoto thyroiditis (n = 24), non-Hashimoto lymphocytic thyroiditis (n = 51), Graves' disease (n = 1), and diffuse hyperplasia (n = 1). The following US features were significantly associated with DTD: decreased or increased parenchymal echogenicity, coarse parenchymal echotexture, increased anteroposterior diameter, lobulated glandular margin, and increased parenchymal vascularity. Of these, coarse parenchymal echotexture was the most significant independent predictor of DTD. The numbers of abnormal US features were positively correlated with the fitted probability and risk of DTD. The diagnostic indices were highest when the chosen cut-off criterion was category III with the largest Az value (0.867, 95% confidence interval: 0.820–0.905), yielding a sensitivity of 68.8%, specificity of 92.2%, positive predictive value of 77.9%, negative predictive value of 88.1%, and accuracy of 85.6% (p < 0.001). Conclusions: Our sonographic reporting and data system may be useful for detecting DTD.


INTRODUCTION
Thyroid ultrasonography (US) is commonly used to detect and characterize thyroid disease owing to its cost-effectiveness and the absence of radiation hazards (1)(2)(3)(4)(5)(6)(7)(8)(9). The diagnostic role of US in the management of nodular thyroid disease has been well-established (1)(2)(3); in contrast, US has a limited role in the diagnosis of diffuse thyroid disease (DTD) because clinical and laboratory findings play a more significant role in DTD. However, clinical evaluations may miss asymptomatic or subclinical DTD, while routine laboratory surveillance may be excessive in other cases (2,10,11). Previous studies have revealed several DTD-specific US features including micronodulation (Hashimoto thyroiditis or chronic lymphocytic thyroiditis), multifocally ill-defined hypoechoic lesions with/without tenderness (subacute thyroiditis), and thyroid inferno (Graves' disease) (6)(7)(8)(9)(10)(11)(12)(13). Additionally, US features suggestive of DTD have been documented, including increased or decreased parenchymal echogenicity, coarse echotexture, increased or decreased anteroposterior diameter (APD) of the thyroid gland, the presence of marginal abnormality, and increased or decreased parenchymal vascularity (4-10, 14, 15). Nevertheless, the routine application of US has limited utility since it depends on the operator's experience and subjective impression (3,15), which is important given that appropriate US diagnosis is critical when communicating the sonographic information to the physician who then determines the patient's management. To overcome potential discrepancies and miscommunications encountered in daily clinical practice, a standardized reporting system for US examinations is necessary.
Recently, a thyroid imaging reporting and data system (TIRADS) with sonographic risk stratification was developed for evaluating thyroid nodules (16)(17)(18)(19)(20). Additionally, structured reporting systems based on imaging modalities have also been explored for other organs such as the liver, ovary, prostate, and lung. However, to the best of our knowledge, a structured US reporting system for evaluating DTD does not yet exist. Therefore, we performed this study to compare the US features of DTD and normal thyroid parenchyma (NTP) in order to categorize US features and to propose a practical structured imaging reporting system for detecting DTD by using US. We refer to our system as the DTD-TIRADS.

Patients
This study was approved by Busan Paik Hospital institutional review board (IRB 18-0102). Given the retrospective nature of the investigation and the use of anonymized patient data, the requirement for written informed consent was waived. Between January and December 2015, 275 patients (228 women and 47 men with a mean age of 46.2 ± 10.7 years; range, 20-73 years) underwent thyroid US before thyroid surgery; all patients underwent thyroid surgery. After 5 patients were excluded because of poor US image quality, 270 patients (224 women and 46 men with a mean age of 46.2 ± 10.7 years; range, 20-73 years) were ultimately investigated.

Preoperative Thyroid US
All patients underwent preoperative thyroid US, which was performed by 2 radiologists with 4 and 13 years of experience in performing thyroid US examination. High-resolution ultrasound scanners (iU 22, Phillips Medical Systems, Bothell, WA, USA; and Aplio 400, Toshiba Medical Systems, Tokyo, Japan), with 5-12 MHz and 8-15 MHz linear probes, respectively, were used. One of the 2 US instruments was arbitrarily used for each patient.

Image Analysis and TIRADS for Diagnosing DTD
In March 2018, a single radiologist retrospectively investigated all US features of the thyroid parenchyma by using a picture archiving and communication system while blinded to clinicoserological information (such as patient age and sex, surgical outcomes, and thyroid test results) and medication history. The radiologist analyzed all US features and classified the samples. The following US features were investigated: echogenicity (normal, decreased, or increased; the strap muscle and adjacent fat tissue were utilized as the reference for determining parenchymal echogenicity); echotexture (fine [normal] or coarse); APD of the thyroid gland (normal [1-2 cm], increased [>2 cm], or decreased [<1 cm]; APDs of both lobes of the main thyroid were measured and averaged); glandular margin (smooth [normal] or lobulated); and vascularity (normal, decreased, or increased) (4,5). Moreover, known specific US features including micronodulation (representing Hashimoto thyroiditis or chronic lymphocytic thyroiditis), multifocally ill-defined hypoechoic lesions with/without tenderness (representing subacute thyroiditis), and thyroid inferno (representing Graves' disease) were considered when determining the US category (6)(7)(8)(9). Based on previous studies (4,5,14,15), we devised a specific DTD-TIRADS algorithm (Figure 1). The US category was determined using the algorithm, which considered the number of abnormal US features as well as the presence of specific US features.

Histopathology
A single board-certified pathologist blinded to the serological and US results retrospectively investigated the histopathological findings for the thyroid gland. Hashimoto thyroiditis was defined as progressive loss of thyroid follicular cells with replacement by lymphocytes and formation of germinal centers associated with fibrosis. Non-Hashimoto lymphocytic thyroiditis was defined as diffuse infiltration of the thyroid gland with lymphocytes and other inflammatory cells without the typical histopathological features of Hashimoto thyroiditis (such as oxyphilic metaplasia, follicular atrophy, and follicular disruption). Diffuse hyperplasia was defined as diffuse hypertrophy and hyperplasia of follicular cells with retention of the lobular architecture and no definite nodule formation. Among the cases showing diffuse hyperplasia, Graves' disease was determined based on serological results. The thyroid gland was considered to show NTP when there was no visual evidence of coexisting DTD.

Statistical Analysis
Data were tested for a normal distribution using a Shapiro-Wilk test. We used the independent t-tests for continuous variables and Pearson's χ 2 test or (for small cell values) Fisher's exact test for categorical variables when comparing the differences in US features and categories between DTD and NTP. The only continuous variable was patient age, and it is expressed as means ± standard deviations.
Associations between US features and DTD were also evaluated by using logistic regression analysis. After adjustment for all variables, multivariate logistic regression analysis with generalized estimating equations was performed to identify the US features that are significant independent predictors of DTD. The results of this analysis are presented as odds ratio (OR) estimates with corresponding 95% confidence intervals (CIs). After the analysis, we obtained a regression equation for fitting the probability of DTD. The scores and the beta coefficients obtained for each factor found to be significant on multivariate logistic regression analysis were multiplied. To evaluate the distribution of fitted probabilities associated with the number of abnormal US features, we estimated the logit (the intercept plus the sum of the beta values multiplied by the given level of each feature variable), which was subsequently used for estimating the fitted probabilities. The Cochran-Armitage trend test was used to evaluate the linear association between the number of abnormal US features and the probability of DTD. Associations between US categories and DTD were also evaluated by using logistic regression analysis. Receiver operating characteristic (ROC) curve analysis was applied to evaluate the diagnostic accuracy of DTD-TIRADS for detecting DTD. A cut-off value for the US category was determined by maximizing the sum of the sensitivity and specificity.
All statistical analyses were performed with SAS statistical software version 9.3 (SAS Institute, Cary, NC, USA). Two-sided P < 0.05 were considered indicative of a significant difference.
The histopathological findings of the thyroid parenchyma in the 270 patients were as follows: NTP (193, 71.5%), Hashimoto thyroiditis (24,8.9%), non-Hashimoto lymphocytic thyroiditis   . Both DTD and NTP were significantly more common in women than in men (p = 0.031), but there was no significant difference in age between the 2 groups (p = 0.863). On univariate analysis, the following US features showed a significant association with DTD: decreased or increased parenchymal echogenicity, coarse parenchymal echotexture, increased APD, lobulated glandular margin, and decreased or increased parenchymal vascularity ( Table 1). On multivariate analysis, the following US features showed a significant and independent association with DTD: decreased or increased parenchymal echogenicity, coarse parenchymal echotexture, increased APD, lobulated glandular margin, and increased parenchymal vascularity ( Table 1). Of the 5 abnormal US features, coarse parenchymal echotexture was the most significant independent predictor of DTD (Figure 2). Multivariate analysis also showed that the risk of DTD increased concomitantly with the number of abnormal US features. The values of fitted probabilities were 0.064 with no abnormal US feature, 0.216-0.393 with one abnormal US feature, 0.675-0.724 with 2 abnormal US features, 0.778-0.908 with 3 abnormal US features, 0.961-0.967 with 4 abnormal US features, and 0.991 with all abnormal US features (Figure 3). The Cochran-Armitage trend test showed that the probability of DTD increased as the number of abnormal US features rose (p < 0.001).

DISCUSSION
We performed this pilot study to evaluate our proposed DTD-TIRADS, which is the first structured reporting system used to perform early diagnosis of DTD and facilitate its management. Early detection of asymptomatic or subclinical DTD is important in clinical practice, especially since as an association between DTD and thyroid malignancy has previously been suggested (2,12,13,(21)(22)(23). In the present study, we used a 4-point categorization system for DTD-TIRADS. Our classification system was based on both the fitted probability and the risk of DTD according to the number of abnormal features on US examination. Hence, DTD-TIRADS can serve as a practical and convenient reporting system in daily clinical practice.
Previous studies have attempted to validate the diagnostic performance of US for detecting asymptomatic DTD despite the controversy over the role of imaging in the diagnosis of DTD (4-10, 14, 15). The reported performances of US for DTD diagnosis have been variable, with sensitivities, specificities, positive predictive values, negative predictive values, and accuracies reported as 80.5-87.7%, 85.7-92.1%, 70.4-75%, and 81.5-97.2%, respectively. We investigated 5 US features described in previous studies (4,5,14,15), and several US features including decreased or increased parenchymal echogenicity, coarse parenchymal echotexture, increased APD, lobulated glandular margin, and increased parenchymal vascularity were found to be independent predictors of DTD. Additionally, the previous studies suggested that certain US features such as micronodulation, multifocally illdefined hypoechoic lesions with/without tenderness, and thyroid inferno were DTD-specific (6-11, 24, 25). However, our results showed that only micronodulation was significantly associated with DTD; this may be owing to the fact that a small number of patients with thyroid inferno and none with multifocally illdefined hypoechoic lesions were included in our study. Further studies are required to clarify this aspect.
Coarse parenchymal echotexture was the most significant independent predictor of DTD among the US features examined in our study. This result supports the present algorithm that the coarse parenchymal echotexture is an essential US feature; all 31 patients classified as category IV exhibited a coarse parenchymal echotexture in this study. We also found that as the number of abnormal US features increased, the fitted probability and risk of DTD increased. The diagnostic performance of DTD-TIRADS was optimal when category III was applied as the cut-off. The results of our study suggest 2 main conclusions: First, patients with no abnormal US features and with a fitted probability of 0.064 for DTD are highly likely to have NTP. Second, patients classified as DTD-TIRADS 2, 3, or 4 (i.e., at least 1 abnormal US feature) with a fitted probability of DTD >0.216 may be candidates for thyroid function tests or other serological examinations to diagnose DTD. As such, our DTD-TIRADS classification system may be helpful.
However, several limitations of our study should be considered. First, there was unavoidable selection bias because all patients had undergone thyroid surgery. Because our institution is a referral medical center, the possibility of a higher proportion of DTD among the enrolled patients than in the general population can be expected. Second, our results were obtained by a single radiologist who reviewed the images retrospectively. Therefore, there was no cross-observer verification for the DTD imaging features. Third, all the patients underwent thyroid surgery; although this was necessary for correlating the US features with histopathological results as a reference standard, sampling bias might have occurred. Fourth, our results were obtained from a single institution with a small study population. Most of the structured imaging reporting systems for other organs were established by following consensus committee deliberations, and our classification system lacked wide clinical application in this respect. Fifth, the fitted probability of DTD for each abnormal US feature had a relatively wide range. Lastly, we did not evaluate the cost-effectiveness or follow-up management in terms of the degree of suspicion of DTD; further studies are required to address these issues.
In conclusion, we propose the DTD-TIRADS, which is based on risk stratification according to the number of abnormal US features and the presence of DTD-specific US features. In clinical practice, this DTD-TIRADS may be helpful for detecting asymptomatic or incidental DTD. Additionally, it can facilitate decision-making and can improve communication among radiologists, physicians, and patients. We expect that further studies with larger sample sizes and multiple participating institutions will validate the DTD-TIRADS proposed herein.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this manuscript will be made available by the authors, without undue reservation, to any qualified researcher.

ETHICS STATEMENT
This study was approved by Busan Paik Hospital institutional review board (IRB 18-0102). Given the retrospective nature of the investigation and the use of anonymized patient data, the requirement for written informed consent was waived. All procedures performed in this study involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.