Acute Angle of Multilobulated Contours Improves the Risk Classification of Thymomas

Background: Computed tomography plays an important role in the identification and characterization of thymomas. It has been mainly used during preoperative evaluation for clinical staging. However, the reliable prediction of histological risk types of thymomas based on CT imaging features requires further study. In this study, we developed and validated a nomogram based on CT imaging and included new indices for individualized preoperative prediction of the risk classification of thymomas. Methods: We conducted a retrospective, multicenter study that included 229 patients from two Chinese medical centers. All the patients underwent cross-sectional CT imaging within 2 weeks before surgery. The results of pathological assessments were retrieved from existing reports of the excised lesions. The tumor perimeter that contacted the lung (TPCL) was evaluated and a new quantitative indicator, the acute angle (AA) formed by adjacent lobulations, was measured. Two predictive models of risk classification were created using the least absolute shrinkage and selection operator (LASSO) method in a training cohort for features selection. The model with a smaller Akaike information criterion was then used to create an individualized imaging nomogram, which we evaluated regarding its prediction ability and clinical utility. Results: A new CT imaging-based model incorporating AA was developed and validated, which had improved predictive performance during risk classification of thymomas when compared with a model using traditional imaging predictors. The new imaging nomogram with AA demonstrated its clinical utility by decision curve analysis. Conclusions: Acute angle can improve the performance of a CT-based predictive model during the preoperative risk classification of thymomas and should be considered a new imaging marker for the evaluation and treatment of patients with thymomas. On the contrary, TPCL is not useful as a predictor for the risk classification of thymomas in this study.


Methods:
We conducted a retrospective, multicenter study that included 229 patients from two Chinese medical centers. All the patients underwent cross-sectional CT imaging within 2 weeks before surgery. The results of pathological assessments were retrieved from existing reports of the excised lesions. The tumor perimeter that contacted the lung (TPCL) was evaluated and a new quantitative indicator, the acute angle (AA) formed by adjacent lobulations, was measured. Two predictive models of risk classification were created using the least absolute shrinkage and selection operator (LASSO) method in a training cohort for features selection. The model with a smaller Akaike information criterion was then used to create an individualized imaging nomogram, which we evaluated regarding its prediction ability and clinical utility.
Results: A new CT imaging-based model incorporating AA was developed and validated, which had improved predictive performance during risk classification of thymomas when compared with a model using traditional imaging predictors. The new imaging nomogram with AA demonstrated its clinical utility by decision curve analysis.
Conclusions: Acute angle can improve the performance of a CT-based predictive model during the preoperative risk classification of thymomas and should be considered a new imaging marker for the evaluation and treatment of patients with thymomas. On the contrary, TPCL is not useful as a predictor for the risk classification of thymomas in this study.

INTRODUCTION
Thymomas are rare primary thymic epithelial neoplasms, and they account for <1% of all adult malignancies (1). These tumors are often located in the anterior mediastinum and have the potential for local invasion (1,2). Traditionally, thymomas are usually divided into invasive (Masaoka stage III/IV) and noninvasive (Masaoka stage I/II) lesions according to the Masaoka-Koga clinical staging system (3,4). Thymomas can also be histologically classified as A, AB, B1, B2, or B3 according to the WHO classification system (revised version of 2015), based on the morphology of epithelial cells and the ratio of lymphocytes to epithelial cells (5). WHO types B2 and B3 are typically considered to be more invasive and are associated with lower survival rates than types A, AB, and B1. Therefore, thymomas can be divided into a low-risk group (types A, AB, and B1) and a high-risk group (types B2 and B3) (6). CT plays an important role in the identification and characterization of thymomas. This imaging technique has been mainly used during preoperative evaluation for clinical staging (4). However, the reliable prediction of histological risk types of thymomas based on CT imaging features still needs further exploration (7,8).
Previous studies have focused on the relationship between CT imaging findings and the WHO histological classification (9)(10)(11)(12)(13)(14)(15). For instance, one study reported that some features (contours, heterogeneous enhancement, infiltration of surrounding fat and lung, and node enlargement) are significantly associated with the WHO classification categories (13). Another study reported that the histological features of aggressive thymomas were significantly correlated with decreased doubling time (DT) and increased growth when DT was evaluated retrospectively and dynamically (14). However, these studies were based exclusively on the evaluation of inter-group CT imaging feature differences and did not include the development of models for classification prediction. Furthermore, other studies have explored the relationship between the tumor perimeter contacting the lung (TPCL) with postoperative pleural recurrence on preoperative CT findings (16,17). One study developed a more objective and quantitative method, with which the authors measured the angle formed by adjacent lobulations to predict lung invasion by thymomas (18). In this previous study, the authors found that adjacent lung invasion can be precisely predicted by the multilobulated aspect of the thymoma when it includes at least one acute angle (AA). However, the relationship between these quantitative imaging indicators and the WHO histological classification has been rarely reported.
Until now, the classification of thymomas from preoperative CT imaging has mostly employed traditional morphological indicators, without reproducible individualized prediction models or the inclusion of objective and quantitative indicators (19)(20)(21)(22). Therefore, in this study, we sought to investigate whether TPCL and an AA formed by adjacent lobulations could constitute quantitative and reliable predictors of thymoma classification. We hypothesized that these new quantitative imaging indicators could be used as independent factors in the development of a predictive model for the risk categories included in the WHO histological classification system of thymomas. Moreover, we also hypothesized that this new model would outperform a model that only includes traditional morphological indicators. The goal was to develop and validate an imaging nomogram to be used in individualized prediction of the risk classifications of thymomas preoperatively using non-invasive data and with minimal demand on patients.

Clinical Samples
We conducted a retrospective multicenter study that included 229 patients. Inclusion criteria were (i) thymoma diagnosed by postoperative pathological examination, (ii) contrastenhanced CT examination performed, and (iii) CT imaging performed within 2 weeks before surgery. Exclusion criteria were (i) CT imaging performed after preoperative neoadjuvant chemotherapy; (ii) myasthenia gravis, hormone therapy, or other treatment options; (iii) CT artifacts that affected the assessment of the lesions; and (iv) recurrent anterior mediastinal mass after thymectomy. We included 169 patients treated between September 2011 and May 2019 in center 1 and 60 patients treated between February 2017 and March 2019 in center 2. We divided the patients into training, internal validation, and external validation cohorts. The training cohort included 120 patients (58 low-risk and 62 high-risk) treated consecutively between September 2011 and October 2016 in center 1. The internal validation cohort contained 49 patients (26 low-risk and 23 high-risk) treated consecutively between November 2016 and May 2019 in center 1. The external validation cohort included 60 patients (29 low-risk and 31 high-risk) from center 2 (Figure 1). We retrieved related clinical information on age, sex, symptoms, myasthenia gravis, clinical stages, and histological classification from the surgical records and pathological reports in the medical records database. Ethical approval was obtained from the institutional review boards of both center 1 (China-Japan Friendship Hospital in China) and center 2 (Jinling Hospital, Medical School of Nanjing University); the need for informed consent was waived because of the retrospective nature of this study.

Computed Tomography Imaging
All the patients underwent CT imaging within 2 weeks before surgery and obtained the pathological examination results. They also underwent preoperative cross section spiral CT scanning examinations. CT images were obtained with a variety of scanners, namely, 16-row multi-detector CT (MDCT) (Aquilion; Toshiba, Tokyo, Japan), 320-row MDCT (Aquilion TM ONE; Toshiba, Tokyo, Japan), and 256-row MDCT (Revolution; GE Healthcare, Chicago, IL, United States) in center 1 and dual-source CT (Somatom Definition; Siemens Healthineers, Erlangen, Germany), 128-row MDCT (Somatom Perspective, Siemens Healthineers, Erlangen, Germany), and second-generation dual-source CT (Somatom Flash; Siemens Healthineers, Erlangen, Germany) in center 2. All images were obtained when the patients were in a supine position with suspended inspiration. An intravenously administered contrast medium was used in all the patients. The images were reconstructed from both the mediastinal (window width, 400-450 HU; window level, 20-50 HU) and lung windows (window width, 1,000-1,500 HU; window level, −650 to −750 HU). Images with 5-mm slice thickness after reconstruction were used for evaluation, and original images with 0.6-1.25-mm slice thickness were available, when necessary, in all cases.

Computed Tomography Imaging Interpretation and Pathological Examination Findings
All the CT images were reviewed using a picture archiving and communications system (PACS; GE Heathcare, Chicago, IL, United States) and were retrospectively reviewed by two radiologists with 10 years of experience, who were all blinded to the clinical details of the patients and pathological findings at the time of image interpretation. Where differences occurred, a third chest tumor radiologist with 23 years of experience addressed the differences for the final decision. The image interpretation criteria used standard reporting terms defined by the International Thymic Malignancy Interest Group (ITMIG) for anterior mediastinal masses suspected to be thymoma (23). Evaluated CT features included the following data about the primary mass and its surrounding structures: lesion location (tumors in the anterior mediastinum were classified into centrally located, right-sided, and left-sided lesions; any tumors that were located around or on the line running through the sternum were considered to be central); size in the x, y, and z axes; contour (smooth, single-lobulated, or irregular multilobulated); internal density (homogenous or heterogeneous); calcifications (without calcification, single, or multiple calcifications); infiltration of surrounding fat; tumor abutment ≥ 50% or <50% of an adjacent mediastinal structure; and direct vascular endoluminal invasion. The following information regarding the surrounding structures was also included: adjacent lung abnormalities, pleural effusion (without, unilateral, or bilateral), mediastinal lymph node enlargement (>1 cm in short axis on an axial image), and phrenic nerve involvement (consistent with elevated hemidiaphragm). Any differences in findings were resolved on a consensual basis.
The thymomas were classified according to the 2015 revised WHO histology classification (5), which is mainly based on the morphology of epithelial cells and the ratio of lymphocytes to epithelial cells. When a tumor showed multiple histological components, it was classified based on the predominant component. The thymomas were divided into low-risk (types A, AB, and B1) and high-risk (types B2 and B3) subgroups, because types B2 and B3 are considered to be more malignant than type A, AB, or B1 (3,6). A modified Masaoka-Koga stage was obtained by reviewing surgical records and pathological reports (24).

Measurement of the New Quantitative Indices and Evaluation of Their Consistency
The maximum tumor diameter was measured at the largest section of the tumor on axial images (size_max) by the same two radiologists who reviewed the CT images. The TPCL was measured by manually drawing the surface of the tumor contour adjacent to the lung, at the two maximum adjacent sections of the tumor shown on axial images. The average TPCL was then calculated as the final value. When the tumor contacted the bilateral mediastinal pleura, only the side overhanging the pleural cavity was measured to measure the TPCL. In multilobulated thymomas interfacing with the lung, the smallest angle formed by adjacent lobulations was uniformly measured twice on lung windows. The average angle was then calculated and classified as an AA or an obtuse angle (OA) (Figure 2). All tumors with smooth contours were counted as OA (18).
Intraclass correlation coefficients were used to determine intra-and inter-observer agreement in the measurement of new indices. A pool of 50 patients was randomly selected from the cohorts, including 25 low-risk and 25 high-risk patients. For inter-observer agreement, two radiologists independently identified the cross-sectional images of the largest tumor area and measured the TPCL and the smallest angle in these patients at the same time. The intraclass correlation coefficients (ICCs) were calculated and analyzed between measurements. To evaluate intra-observer agreement, the TPCL and AA were all measured twice for each patient by each radiologist within 1 month, and the ICCs were separately calculated and analyzed.

Feature Selection and Building of the Models
For imaging features defined by ITMIG terms and the new quantitative indices, regularized multivariate logistic regression with the least absolute shrinkage and selection operator (LASSO) penalty method was applied to the training cohort to reduce overfitting or any type of bias in feature selection (25). The selected features were then weighted by their respective coefficients in the regression equation formula as follows: where y is 1 for patients with high-risk thymoma and 0 for lowrisk patients, β 0 is the constant term, n is the number of features used in the model, β i (i = 0, 1, 2, . . . , n) is the model parameter of coefficient, X j (j = 0, 1, 2, . . . , n) is the feature, and ε is the error term. The LASSO criteria for selecting parameters based on minimizing the value of the following cost equation: where N is the number of patients, y i is the outcome labels of patient i, n is the number of features, X ij is the jth feature of the ith patient, β i (i = 0, 1, 2, . . . , n) is the model parameter of coefficient, β 0 is the constant term, and λ is the regularization parameter. Least absolute shrinkage and selection operator method is a shrinkage and selection method for linear regression (26). It aims to minimize the sum of squares of residual errors (MSE) under the condition that the sum of absolute values of a regression coefficient that is less than a constant is deleted during variable selection. We selected the optimal value of λ by leave-oneout cross-validation. We considered λ optimal if it minimized MSE and maximized the area under the receiver operating characteristic curve (AUC) in the training cohort. To test the robustness of the final number of features included in the model, we repeated the feature selection procedure at one SE of the optimal λ value (lambda 0.1se).
The model, including the new quantitative index, was compared with a model without this new index using the corresponding Akaike information criterion (AIC) values for each model and their related statistical tests to estimate model complexity and data fitting performance. Furthermore, DeLong tests were performed to compare all pairs of receiver operating characteristic curve (ROC) (27).

Construction and Evaluation of the Nomogram
We constructed a nomogram based on CT imaging features with the new quantitative index and applied it to predict the risk classification of thymomas. We assessed the accuracy of the nomogram using ROC curves. We then calculated AUCs and compared them between the training cohort and the two validation cohorts by DeLong tests (27). We also determined sensitivity and specificity.
We assessed the calibration of the nomogram using calibration curves and unreliability (U) statistics. We also conducted a decision curve analysis to evaluate the clinical utility of the nomogram by quantifying the net benefit of its use at different threshold probabilities in the validation datasets.

Clinical Information and Imaging Features
The clinical characteristics of the patients are summarized in Supplementary Table 1. We found that imaging features, namely, size_max, contour, density, calcification, and the new index, AA, were significantly associated with risk classification in the training cohort (p < 0.05). Only AA was significantly associated with risk classification in the internal validation cohort (p < 0.05). The inter-group statistical results of features, such as density, calcification, and pleural effusion, approached statistical difference. Size_max, density, and AA were significantly associated with risk classification in the external validation training cohort (p < 0.05); contour and calcification trended toward significance.

Evaluation of the Consistency of the Measurement of the New Quantitative Indices
We found that the measurement of both TPCL and AA presented good inter-and intra-observer agreements. The ICCs were 0.9 for TPCL and 0.85 for AA between the two radiologists. For intraobserver agreement, the ICCs were 0.93 for TPCL and 0.88 for AA in one radiologist, and 0.92 for TPCL and 0.89 for AA in the other.

Feature Selection and Model Development
From all the imaging features, 23 (with AA) were reduced to three potential predictors based on data from the 120 patients of the training cohort (8:1 ratio; Figures 3A,B). These three features were retained with non-zero coefficients in the LASSO logistic regression model with a minimum λ of 0.1052 and then used in the regression equation to build the first model. The result demonstrated that TPCL was ineligible for the risk classification of thymoma (11:1 ratio; Figure 3). Without considering the AA factor, 22 features were reduced to two predictors with a minimum λ of 0.1155 based on data from the training cohort. These two features were used in the regression equation to build the second model (Supplementary Table 2). We found significant differences between the two models for both AIC and AUC (Supplementary Table 3; Figure 4).

Construction and Evaluation of an Individualized Nomogram
We used the first model with the smaller AIC to construct an individualized imaging nomogram incorporating three independent predictors: contour, density, and AA ( Figure 5). The classification accuracies were 77.5, 73.47, and 70% in the training, internal validation, and external validation cohorts, respectively. The sensitivities were 74.19, 60.87, and 70.97%, and the specificities were 81.03, 84.62, and 68.97% in the training, internal validation, and external validation cohorts, respectively (Supplementary Table 4; Figure 4).

Validation of Individualized Nomogram
We found that calibrations for the probability of risk classification of thymomas were good in the three cohorts (training, p = 0.062; internal validation, p = 0.267; external validation, p = 0.14). The C-index of the nomogram for the prediction of lymph node status was 0.811 (95% CI, 0.731 to 0.889), 0.766 (95% CI, 0.63 to 0.902), and 0.765 (95% CI, 0.644 to 0.886) in the training, internal validation, and external validation cohorts, respectively (Supplementary Table 2

Clinical Use
We present the decision curve analysis for the first and second models in Figure 6. The decision curve showed that if the threshold probability in the clinical decision is higher than 18%, using the radiomics nomogram to predict risk classification performs better than either the treat-allpatients or the treat-none schemes. Within this range, the net benefit was comparable between the two models with several overlaps.

DISCUSSION
In this study, we developed and validated a diagnostic CTbased predictive model for individualized preoperative risk classification in patients with thymomas. The new model incorporates contour, density, and the new quantitative index AA. The new model successfully stratified the histological grading of tumors according to their risk classifications. We established a nomogram that can facilitate the clinical evaluation and treatment of thymomas. Twenty-three features (with AA) were reduced to three potential predictors to build the first model in the training cohort; the area under the receiver operating characteristic curve (AUC) was plotted vs. log (λ). Dotted vertical lines were drawn at the optimal values using the minimum criteria and the 1 standard error of the minimum criteria (the 1-SE criteria). (B) A coefficient profile plot was produced against the log (λ) sequence. A vertical line was drawn at the value selected by 10-fold cross-validation, where optimal λ (0.1052) resulted in 3 non-zero coefficients. (C) Twenty-two features (without AA) were reduced to two potential predictors to build the second model in the training cohort; the area under the receiver operating characteristic (AUC) curve was plotted vs. log (λ). Dotted vertical lines were drawn at the optimal values using the minimum criteria and the 1 standard error of the minimum criteria (the 1-SE criteria). (D) A coefficient profile plot was produced against the log (λ) sequence. A vertical line was drawn at the value selected by 10-fold cross-validation, where optimal λ (0.1155) resulted in 2 non-zero coefficients.   The decision curve showed that if the threshold probability in the clinical decision is more than 18%, using the imaging nomogram to predict risk classification adds more benefit than either the treat-all-patients scheme or the treat-none scheme. Within this range, the net benefit was comparable based on two models with several overlaps, and the model with AA showed better.
In this study, all the patients underwent preoperative crosssectional spiral CT examinations using the same body position and scanning methods. However, we should highlight that the CT images were obtained with different types of scanners in the two centers. This is a common limitation in multicenter studies, and we have performed the following measures to improve the consistency of the results. First, all the patients underwent preoperative cross-section spiral CT examinations. They were scanned in the same body position and same methods, although the CT images were obtained with different types of scanners. All the images were reconstructed with the same parameters and methods, thus ensuring consistent evaluation and measurement of image quality and reducing errors in imaging processes among the different CT scanners (28,29). In addition, the research focused on the imaging features and quantified indicators based on CT from the perspective of clinical diagnosis. Therefore, any error mainly comes from the experience of the radiologists and measurements. We introduced two new key quantitative indices, TPCL and AA (16)(17)(18) and showed that they can be consistently measured. Indeed, the results of the consistency analysis by the ICC evaluation of interand intra-observer agreements showed that the consistencies of both TPCL and AA measurements were satisfactory. The use of standardized techniques in this study minimized errors related to image assessment, assuring the quality of these new quantitative indices (30).
During the construction of the first and second models in this study, the candidate imaging features were reduced to three or two potential predictors using the LASSO method to obtain a subset of features with optimal stability and accuracy (25,26). The LASSO regression algorithm is a regularized feature selection method with which all independent variables are processed simultaneously based on variance trade-off. Several previous studies have demonstrated the advantages of using LASSO to select parameters when constructing predictive models (25,26,31,32). However, we are not aware of any study that has used this method for histological grading of thymomas. Tumor size, contours, internal density, calcification, and the new index AA all showed significant differences between low-and high-risk thymomas in this study. However, tumor size, TPCL, and calcification were not selected as independent risk factors in the final regression model. These results match those of previous studies in the field, although we should note that CT imaging was shown to not differentiate between histological subtypes of thymomas in these previous studies. Of note, Johnson et al. reported an association between CT imaging and pathologic assessments of the tumor response after neoadjuvant treatment (21). Tumor size and calcification have been found to be related to type B3 thymoma in previous studies; however, we could not replicate these findings (33).
By comparing AICs and AUCs between the two models, we show that the new model that included AA has better predictive performance. The AIC is a standard method for estimating associations between model complexity and datafitting performance (34). In this study, we used the AIC to examine whether the inclusion of three predictors in the first model would counteract the performance benefits by bringing additional complexity costs. We found that this is not the case. In fact, the differences between the two models in the AIC pinpointed that the benefits of including an extra predictor outweighed the costs caused by the increased complexity. We also compared the predictive performance of the two models by testing for differences between their AUCs by DeLong tests. We found that the new model with AA performed better in the identification of risk classifications in thymomas in all the three cohorts, but we determined that TPCL is ineligible for risk classification of thymoma. The presentation of an AA in tumors indicates a high-risk pathological type with high invasion potential. Therefore, we speculate that AA, being an independent predictor of risk category, is closely related to this pathological category of thymomas (18).
Thymomas are heterogeneous tumors, and their histopathology grading is associated with multiple imaging features (8,(35)(36)(37). Choe et al. reported that the histological features of aggressive tumors were significantly correlated with decreased DT and increased growth (14). In addition, Green et al. found that the presence of an AA between lobulations in multilobulated thymomas can predict lung invasion with satisfactory accuracy (18). Until now, there has been no study reporting the reliable identification of histological types of thymomas based on CT imaging features (36). We aimed to bridge this gap by constructing and validating a nomogram incorporating new quantitative imaging variables. The nomogram presents a good performance in generating individualized probabilities of the risk classification of thymomas. The results demonstrated good generalizability and provided a tool that can be used by both physicians and patients to perform preoperative individualized prediction of the risk classification of thymomas, following the current trends toward personalized medicine.
The introduction of the model in clinical practice has the potential to benefit both the diagnosis and treatment of patients with thymomas. We comprehensively evaluated the discrimination and calibration of the risk classification predictions and demonstrate the reliability of the results of the model. However, it is also important to consider whether the nomogram-assisted decisions would improve clinical outcomes. We performed a decision curve analysis to examine the clinical consequences of the tool based on different threshold probabilities and net benefits. We implemented this analysis by calculating the proportion of true positives minus the proportion of false positives weighted by the relative harm of falsepositive and false-negative results. We found that for threshold probabilities higher than 18%, the use of the nomogram to predict risk classification outperforms both the treat-all-patients and the treat-none schemes. This analysis demonstrates the clinical utility of the tool.
This study has some limitations. First, the incidence rate of thymoma was very low. It is very precious and timeconsuming to obtain patients with complete clinical, imaging (qualified contrast-enhanced CT), and definite pathology results. Second, we relied on data collected retrospectively; 120 and 49 patients were used for model training and internal validation in this study, respectively, while only 60 patients were included for external validation. Compared with the training set and internal validation set, the proportion of the two risk types was relatively balanced and may support the results of this study. Third, the CT scanners used in this study and their parameters were different, and the diaphragm evaluation was based on the coronal reconstruction of images. However, we suggest that these parameters are unlikely to have strongly affected the evaluation of parameters that can be assessed and measured using CT images. Using high-dimensional data from a quantitative analysis of tumor volume and radiomics could improve the predictive models.

CONCLUSIONS
In conclusion, we developed and validated a new CT-based model incorporating AA that can improve predictive performance during the risk classification of thymomas, when compared with traditional imaging predictors. By decision curve analysis, we demonstrated the clinical utility of the tool. AA should be considered as a new imaging marker for the evaluation and treatment of patients with thymomas.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

ETHICS STATEMENT
The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. Ethical approval was obtained from the institutional review boards and the informed consents were waived from the patients for the retrospective nature of this study.

AUTHOR CONTRIBUTIONS
XH and SL acquired, analyzed, explained the data, and drafted the manuscript. HY, YC, BL, and WG analyzed and explained the imaging data. YW and XL acquired the clinical information and revised the manuscript. LZ and GM designed the study and revised the manuscript. All authors contributed to the article and approved the submitted version.

FUNDING
This research was funded by Beijing science and technology planning project (Z211100003521009).