Advanced Warning of Aortic Dissection on Non-Contrast CT: The Combination of Deep Learning and Morphological Characteristics

Background: The identification of aortic dissection (AD) at baseline plays a crucial role in clinical practice. Non-contrast CT scans are widely available, convenient, and easy to perform. However, the detection of AD on non-contrast CT scans by radiologists currently lacks sensitivity and is suboptimal. Methods: A total of 452 patients who underwent aortic CT angiography (CTA) were enrolled retrospectively from two medical centers in China to form the internal cohort (341 patients, 139 patients with AD, 202 patients with non-AD) and the external testing cohort (111 patients, 46 patients with AD, 65 patients with non-AD). The internal cohort was divided into the training cohort (n = 238), validation cohort (n = 35), and internal testing cohort (n = 68). Morphological characteristics were extracted from the aortic segmentation. A deep-integrated model based on the Gaussian Naive Bayes algorithm was built to differentiate AD from non-AD, using the combination of the three-dimensional (3D) deep-learning model score and morphological characteristics. The areas under the receiver operating characteristic curve (AUCs), accuracy, sensitivity, and specificity were used to evaluate the model performance. The proposed model was also compared with the subjective assessment of radiologists. Results: After the combination of all the morphological characteristics, our proposed deep-integrated model significantly outperformed the 3D deep-learning model (AUC: 0.948 vs. 0.803 in the internal testing cohort and 0.969 vs. 0.814 in the external testing cohort, both p < 0.05). The accuracy, sensitivity, and specificity of our model reached 0.897, 0.862, and 0.923 in the internal testing cohort and 0.730, 0.978, and 0.554 in the external testing cohort, respectively. The accuracy for AD detection showed no significant difference between our model and the radiologists (p > 0.05). Conclusion: The proposed model presented good performance for AD detection on non-contrast CT scans; thus, early diagnosis and prompt treatment would be available.


INTRODUCTION
Aortic dissection (AD) is a life-threatening disease for which early diagnosis and treatment are critical. The mortality rate increases by 1-2% per hour after symptom onset (1). Typically, patients may present with symptoms such as sudden onset of severe chest pain or back pain. To date, CT angiography (CTA) is the best imaging modality for identifying displaced intimal flaps in contrast-enhanced scans, with a sensitivity and specificity approaching 100% (2,3).
However, CTA is to some degree restricted due to the allergenicity and nephrotoxicity of contrast agents and the lack of 24-h availability in some emergency departments, particularly in rural or underserved areas that lack technical and staff support (4). Moreover, many patients who present atypical or asymptomatic AD in the early stages have been missed diagnosed and deteriorate rapidly (5). In comparison, noncontrast CT scans are widely available, convenient and easy to perform, and have relatively lower radiation doses (6)(7)(8).
The imaging characteristics of AD on non-contrast CT scans include displaced calcified intimal flaps, intraluminal linear high density, intramural hematoma, and aneurysmal dilatation. However, technical level of the radiologists for AD detection on non-contrast CT scans is suboptimal and currently lacks sensitivity (9).
Compared with traditional methods, the deep learning (DL) algorithms have advantages in the extraction and recognition of subtle differences in digital imaging information. He et al. proposed residual networks (ResNets) (10) that won first place on the ImageNet Large Scale Visual Recognition Challenge (11), which outperformed human accuracy in image classification. Hata et al. (12) designed a DL algorithm for the detection of AD on non-contrast CT; however, their method was limited to twodimensional (2D) models with image data from a single center.
In this study, we hypothesized that a machine learning model that integrated the prediction of the DL model and morphological characteristics could effectively detect AD on non-contrast CT images. The aim of this study was to build the DL-based model for the early detection of AD using noncontrast CT scans and to demonstrate that the combination of morphological characteristics can strengthen the model performance. We further validated and compared its detection performance with three radiologists at two independent centers.

MATERIALS AND METHODS
This study was reviewed and approved by the local clinical Institutional Ethics Committees of the two centers involved and a written informed consent was waived because of the retrospective nature of this study.

Population of Patient
Between July 2014 and April 2020, 5,885 consecutive patients underwent CTA scans at the Peking Union Medical College Hospital (PUMCH), Beijing, China. The presence of AD was confirmed by the CTA interpretation results and 191 patients were diagnosed with AD. After the inclusion and exclusion criteria were applied (detailed in Supplementary S1), 139 patients with AD were enrolled and 202 patients diagnosed without AD from the same period were approximately propensity matched from the remaining 5,694 patients with non-AD, considering two variables (age and sex).
Thus, 341 patients were enrolled from the PUMCH and were randomly divided into the training cohort (70%, 238 patients with 96 patients with AD and 142 patients with non-AD), validation cohort (10%, 35 patients with 14 patients with AD and 21 patients with non-AD), and internal testing cohort (20%, 68 patients with 29 patients with AD and 39 patients with non-AD).
From another independent medical center, the Shenzhen Second People's Hospital (SSPH), Shenzhen, China, 2,273 consecutive patients underwent CTA scans between July 2017 and June 2020. Among them, 70 patients were diagnosed with AD. After the same inclusion and exclusion criteria were applied, 46 patients with AD were enrolled and 65 patients with AD were propensity matched. Then, the external testing cohort was constructed (Figure 1).
The DL model was trained on the training cohort and the validation cohort was used to decide the stopping iteration. The Gaussian Naive Bayes (Gaussian NB) algorithm-based models were trained on the combined training and validation cohorts. After the training procedure, both models were evaluated on the internal testing cohort and external testing cohort.

Computed Tomography Image Data Acquisition
All the CT scans were performed using post-64-detector row CT scanners from Siemens (Somatom Definition Flash or Somatom Force, Forchheim, Germany) and Philips (iCT Elite FHD or IQon Spectral CT, The Netherlands). Every scan began with noncontrast scanning from the thoracic inlet to the pubic symphysis to cover the entire aorta. Afterward, contrast-enhanced CT scans were performed over the same area during the systemic arterial phase. The slice thickness was 1-5 mm for non-contrast CT images and 1 mm for contrast-enhanced CTA images. The other scanning parameters were as follows: rotation time 0.5 s, pitch 1.2, matrix 512 × 512, standard resolution algorithms, tube voltage 80-100 kV (Somatom Definition Flash, Somatom Force, Forchheim, Germany) and 120 kVp (iCT Elite FHD, IQon Spectral CT, The Netherlands), and the tube current adjusted automatically.

Radiologists Interpretation of CT Images
The diagnostic interpretations were performed by three radiologists including a junior radiologist with 7 years of experience in cardiovascular imaging [radiologist 1 (YY)] and two senior radiologists with 14 and 16 years of experience [radiologist 2 (ZD) and radiologist 3 (YW), respectively]. These three radiologists interpreted the anonymous non-contrast CT images independently and indicated their dichotomous diagnosis (AD and non-AD).
The characteristics of AD on the non-enhanced CT images included aortic calcification deviation (>5.0 mm), signs of intimal flap, and high-density areas in the aorta; the indirect parameters included uneven density in the aorta, limited or extensive aortic dilatation, irregular aortic morphology, and pericardial or pleural effusions (9,13,14).

Overview of the Model Construction
An overview of the model construction is given in Figure 2. Before AD detection model building, aorta segmentation was performed to find the three-dimensional (3D) Volume of Interest (VOI) region of the aorta. Then, it was used to crop the aorta volume and extract the morphological characteristics including the aortic maximum diameters and general morphological features. In this study, a 2-stage AD detection model was built. First, as shown in Figure 2B, the 3D DL model based on ResNet34 was built and the prediction probability of the DL model was used as the DL score. Finally, as shown in Figure 2C, our proposed deep-integrated model was based on the Gaussian NB algorithm and trained on the combination of the DL score and all the morphological characteristics to predict the AD status.

Aorta Segmentation and the Extraction of Morphological Characteristics
The aorta mask was extracted by the 2.5D UNet-based DL model, which was trained and validated on the in-house dataset and is given in Supplementary S2. Morphological characteristics were extracted from the aorta mask including the aortic maximum diameters [the maximum diameter of the ascending aorta (AC) and the maximum diameter of the descending aorta (DC)] and 14 general morphological features extracted by PyRadiomics (version 3.0). The aortic maximum diameters were binarized by the threshold of 4 and 5 cm to form four aortic maximum diameter features, i.e., AC > 4 cm (1 for AC > 4 cm and 0 for AC ≤ 4 cm), AC > 5 cm (1 for AC > 5 cm and 0 for AC ≤ 5 cm), DC > 4 cm (1 for DC > 4 cm and 0 for DC ≤ 4 cm), and DC > 5 cm (1 for DC > 5 cm and 0 for DC ≤ 5 cm). The general morphological features were normalized by z-score normalization. The morphological characteristics are given in Supplementary S3.

Three-Dimensional Deep-Learning Model for AD Classification
After aorta segmentation, the aorta mask was used to crop the aorta volume. Only aorta pixel values were kept in the aorta volume and then the aorta volume was resized to 64 × 64 × 64. The values were truncated to the mediastinum window (50, 350) and the volumes were employed as the input of the 3D DL model.
We used 3D ResNet (15) as a basic structure of the detection model. 3D ResNet combined an encoder with a fully connected layer for classification (classifier). The same modification of the encoder as MedicalNet (15) was adopted to perform transfer learning using the pre-trained weight from 23 public medical datasets. The optimization was performed by binary cross-entropy loss with the Stochastic Gradient Descent (SGD) optimizer with learning rates of 0.001 and 0.01 for the encoder and classifier, respectively. The weight decay of the SGD optimizer was 0.001 and the momentum was 0.9.
After the 3D DL model was built, the prediction probability of the existence of AD was used as the DL score ( Figure 2B). The higher the DL score is, the more likely that the 3D DL model indicates the existence of AD.

Proposed Model Combined With the DL Score and Morphological Characteristics
Based on the previously calculated morphological characteristics (the aortic maximum diameters and general morphological features) and the DL score, a model based on the Gaussian NB algorithm was built to predict the AD status (deep-integrated model). The deep-integrated model was built on the basis of the 3D DL model; thus, it integrated the 3D information. The optimal subset of morphological characteristics was selected by the Spearman's rank correlation test and the characteristics with a p-value <0.05/18 (Bonferroni correction, 18 tested features) was remained.
The deep-integrated model was trained based on the Gaussian NB algorithm and for the Gaussian NB algorithm for classification, the likelihood of the features is assumed to be Gaussian: The parameters µ and σ are estimated using maximum likelihood.
The training cohort and the validation cohort were merged to train the deep-integrated model using the 10-fold crossvalidation procedure. For each iteration of the cross-validation, the model was trained 9-fold and validated on the remaining 1-fold. Then, the validation folds were assembled to form the cross-validation result. After the optimal hyperparameters were selected by the cross-validation result, the final integrated model was retrained on the merged cohort using the optimal hyperparameters and the performance on the internal and external testing cohorts was evaluated by quantifying the accuracy, sensitivity, specificity, and the area under the receiver operating characteristic curve (AUC) ( Figure 2C). The performance of the integrated model on different subtypes was compared. The internal testing cohort and the external testing cohort were divided by the Stanford type diagnosed by the CTA scans and the accuracy of the model and radiologist on these subsets were evaluated and compared.
The robustness of the deep-integrated model at different slice thicknesses was further evaluated. Among the 111 patients in the external testing cohort (SSPH), 63 patients only underwent non-contrast scans with a slice thickness > 8 mm. However, the slice thickness of the training cohort and internal testing cohort from PUMCH was <5 mm. It is important to monitor the impact of performance based on the slice thickness. The external testing cohort was divided according to whether the scans were thicker than 8 mm and the performance was compared.

Statistical Analysis
All the statistical results were calculated in Python and R (version 3.6.0; https://www.r-project.org/) environments. The demographics of the patient among the three cohorts were compared by the ANOVA tests or the Pearson's chi-squared test when appropriate. For the AD detection model and radiologist assessment, we used the Pearson's chi-squared test with the Yates' continuity correction to compare the sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV) of the classification model and artificial interpretation of non-enhanced CT scans. The Fleiss's kappa coefficient was used to measure the consistency of the 3 radiologists. The AUC (0.95 CI) was calculated to evaluate model performance in the two data centers. The two AUCs were compared by the DeLong method (16). A p-value of <0.05 was considered to indicate a significant difference.

Population of Patient
In total, 452 patients were enrolled and divided into the training cohort (n = 238), validation cohort (n = 35), internal testing cohort (n = 68), and external testing cohort (n = 111). Table 1 shows the detailed demographics of patient and CT image parameters of the training, validation, internal testing, and external testing cohorts. The training cohort and the validation cohort were combined because they were used to find the optimal hyperparameters and to train the model. There were no significant differences in terms of age among the cohorts (p = 0.845), but sex was significantly different (p = 0.002). It should be noted that the slice thickness was significantly different (p < 0.001) and the CT scans in the SSPH cohort were thicker than those in the PUMCH cohort.

Performance of the Models
The diagnostic performance of each model is shown in Table 2 and the results of the receiver operating characteristic (ROC) curve analysis are shown in Figure 3.
After the feature selection procedure, 16 features were used to build the deep-integrated model including the DL  The p-value was calculated by the DeLong test on the internal testing cohort and the external testing cohort. *p < 0.05. **p < 0.01. ***p < 0.001. score, 4 maximum aortic diameter features, and 11 general morphological features. Figure 4 shows the µ and σ parameters of the trained integrated model (17) as well as the feature names. The µ parameter is the mean of each feature per class and the σ parameter is the SD of each feature per class. In general, patients with AD tend to have higher DL scores and higher AC and DC. However, most of the general morphological features were lower in AD cases, except sphericity. It should be noted that the radiomic scores were normalized by z-score normalization. The selected features and the corresponding µ and σ coefficients were given in Table 3. In the internal testing cohort, the deep-integrated model reached accuracies of 0.923 and 0.813 on the Stanford type A subset and the Stanford type B subset, respectively, while in the external testing cohort, they were 1.000 and 0.960. The performance on the Stanford type A subset was better than that on the Stanford type B subset, but the difference was not significant ( Table 4).

Comprehensive Analysis of the Performance of the Model
In the external testing cohort, the sensitivity was lower than that in the internal testing cohort. The AUC and sensitivity of the deep-integrated model were consistently determined on the subsets divided by the slice thickness. However, the specificity on the thicker subset was lower than that on the thinner subset (p = 0.06) ( Table 5).

Compared With the Radiologists Interpretation
The Fleiss's kappa coefficient among the three radiologists was 0.80 and 0.51 in the internal testing cohort and the external testing cohort, indicating substantial consistency and moderate consistency, respectively.
The accuracy of the deep-integrated model was superior or equal to that of the three radiologists in both the internal and external testing cohorts, but not significantly. The sensitivity of the deep-integrated model was higher than that of all the three radiologists. It was significant between the deep-integrated model and radiologist 2 (p = 0.04) in the internal testing cohort and was significant between the deep-integrated model and all the three radiologists in the external validation cohort (p < 0.001). However, the specificity of the deep-integrated model was lower than that of all the three radiologists. No significance was found in the internal testing cohort, but it was significant in the external testing cohort (p < 0.001) (Table 6, Figure 5). Figure 6 shows the AD cases with CT images.

DISCUSSION
This study initially developed and trained a machine model that integrated the DL model and morphological characteristics of non-contrast CT images and then validated its performance at two medical centers. The results showed that the deepintegrated model was comparable to or slightly outperformed the human expert interpretation of radiologists with intermediate to high amounts of experience. This deep-integrated model could potentially support the early detection of AD based on non-contrast CT images and help to optimize the clinical workflow.
Computed tomography angiography is the best imaging modality to diagnose AD (2,3,18,19), while CTA is commonly restricted in some emergency departments, especially in rural or underserved areas that lack technical and staff support (4). Making the best use of non-contrast CT scans to assist the early warning of AD in clinical practice is of great significance and has the potential to greatly improve patient outcomes. However, the poor sensitivity and high false-negative (FN) detection value reported in previous studies (9) are major concerns. The underlying causes might be related to the threshold of detecting subtle differences in grayscale images by the naked eye (8). This is supported by the observation that AD intimal rupture with relatively normal outline morphology is difficult to identify by radiologists. In addition, it is also difficult for human experts to distinguish between ruptured aortic aneurysms and unruptured aortic aneurysms on non-contrast CT images.
The DL technology has been increasingly applied to medical data analysis and CT-assisted diagnosis and has demonstrated great abilities in several issues. Studies have reported that the DL technology contributes greatly to expanding the amount of information accessible in CT images beyond human recognizability limitations. Recently, Hata et al. (12) designed the 2D DL algorithm for the detection of AD on non-contrast CT and reached an AUC of 0.940 on the internal testing set. The results of the 2D model showed comparable diagnostic performance to radiologists, which was consistent with the FIGURE 4 | The µ and σ parameters of the deep-integrated model. The σparameter was used as an error bar. In general, patients with AD tend to have the higher DL scores and higher ascending aorta (AC) and descending aorta (DC). However, most of the general morphological features were lower in AD cases, except sphericity. observation made in this study. However, their 2D algorithm did not utilize 3D spacing information and the thresholds used to generate study-based AD detection results from the 2D results could vary among different datasets. This issue might limit its application in clinical systems. The performance of the previously reported 2D model was inferior to our integrated 3D model (AUC: 0.948 on the internal testing cohort and 0.969 on the external testing cohort). In addition, the imaging data of the previous study were collected from a single center, which might lead to potential issues with reliability and reproducibility of the results.
In this study, we proposed a deep-integrated model, a Gaussian NB algorithm-based model, for the early detection of AD using non-contrast CT scans and integrated both the DL model and morphological characteristics. This model has been validated by datasets from two independent clinical centers. The aorta volume was retrieved by an aorta segmentation model and only the aorta pixel value was kept in the aorta volume. This approach reduces unrelated context noise and enables AD detection to focus on aorta detection and reduces the input size of the DL model such that the input can maintain a higher resolution. The higher resolution input increases the sensitivity of AD detection. Morphological characteristics were extracted from the aorta mask. After the DL model was built, a Gaussian NB algorithm-based model (deepintegrated model) was built on the combination of the DL score and morphological characteristics, demonstrating that the combination of morphological characteristics can strengthen the model performance. The DL model was used to capture the texture information, while the morphological features were used to capture the shape-based information. Thus, the DL score and morphological features provided complementary information.
The average sensitivity and specificity values of human expert interpretation in predicting AD on non-enhanced CT in this study were consistent with the results from previous studies and the sensitivity on the internal testing cohort was increased to 70-80% compared to a previous study result of 59-61% (9), except for radiologist 2. More predictive markers might partially contribute to the sensitivity improvement. In addition, all the participating radiologists in this study are from large academic medical centers and have experience specific to cardiovascular imaging, which may have contributed to their superior performance. It is, therefore, conceivable that general radiologists working in the community would have lower performance in the detection of AD on non-contrast CT scans. Compared to the radiologists, our model integrated the score of the DL model, which can detect the subtle textures that correlate with AD status. While the proposed AD detection model showed a significant advantage in detection sensitivity, it may be helpful for overcoming the weaknesses of human expert interpretation.
Notably, the specificity of the deep-integrated model was improved to 92.3% in the internal testing cohort, which was slightly lower than that of the radiologists and apparently higher than the corresponding results from a prior study of 85.5% (12). However, in the external testing cohort, the specificity performance decreased to 55.4%. After the exclusion of the thick scans, the specificity of the deep-integrated model increased to 68.7%. Thus, the decreasing specificity might be related to the differences in the image data and scanning parameters between the different medical centers. On one hand, higher-resolution CT can provide more diagnostic information (20, 21). Thus, thicker images may exclude some information for the detection of AD. On the other hand, there was a difference between the training cohort and external testing cohort in terms of layer thickness, which may have cause some degradation of the model performance.
The anatomic classification of AD mainly reflects the extent of the dissections and the location of the intimal tear and evaluates the degree and prognosis of lesions to guide the selection of clinical personalized treatment and operation. As indicated in this study, the detection efficiency of the deep-integrated model for AD of the Stanford type A was superior to that of the Stanford type B. This outcome could mainly be explained by the wider dissection range involved and the probability of more information and characteristics related to AD of the Stanford type A than that of the Stanford type B.
Although the CTA scan for diagnosing aortic dissection cannot be replaced by non-contrast CT in the near future and this study might not be the optimal or only approach for every patient, it is expected that the deep-integrated model could potentially be applied in the clinical setting to support clinical decision-making and improve the early detection of suspected AD in some cases. We supposed that it might help in conditions when CTA is not that convenient or timely. The significance of this technique is related to the early detection of asymptomatic patients. Furthermore, a specific group of atypical or asymptomatic patients with AD would particularly benefit from this assessment model, as it is highly likely that the diagnosis of their condition would otherwise have been missed. The model might be more helpful to community radiologists who lack specific experience or training in cardiovascular imaging and less experienced radiologists. Another possible solution is to generate synthetic CTA from non-contrast CT images (22,23). However, this approach is hampered by the lack of a comprehensive dataset, which may lead to bias in the generated CTA and might be potentially harmful for the robustness of the model.
The main limitations of this study are as follows. First, the sample size of this study was small and relatively low   The p-value was calculated by the Pearson's chi-squared test with the Yates' continuity correction. *p < 0.05. **p < 0.01. ***p < 0.001. accuracy results were obtained in the external testing cohort; therefore, a detailed analysis based on the subtypes of AD was not possible. The increase in specificity indicated that the slice thickness may partly explain the decline in specificity. However, the specificity was still not satisfactory, indicating that other reasons (image quality, manufacturer, etc.) contributed to the decline in specificity, but these factors were not analyzed. In addition, this was a retrospective study and detailed information about initial symptoms and the purpose for CT examination of the enrolled patients were incomplete. However, the prevalence of asymptomatic and unsuspected patients with AD might be significant for the clinical use of this model. Furthermore, potential challenges (e.g., inconsistency in image quality, contrast and imaging protocols from different centers) might be necessary for translating this method into a clinical tool. To validate the clinical potential of the model, multicenter prospective trials with a range of CT examination types will be needed to further investigate the reliability and reproducibility of our results.

CONCLUSION
The deep-integrated model, an integrated matching learning model, was comparable to or slightly outperformed the human expert interpretation of radiologists with intermediate to high amounts of experience in detecting AD on non-contrast CT images. This model might contribute to the improvement in early disease detection and downstream clinical decision optimization for patients at risk for AD.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.

ETHICS STATEMENT
Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

AUTHOR CONTRIBUTIONS
YY, YG, YW, and ZJ designed the study. YY, YG, and XL acquired and collected the data in clinical studies. YY, YG, YW, LM, CW, and DJ analyzed and interpreted the data. LM performed the statistical analysis. YY and LM drafted the manuscript. YL, JP, JL, and SL revised the manuscript critically for important intellectual content. YW and ZJ are the guarantors of the integrity of the entire study. All the results were checked by CW and X-LL. All authors finally approved for submitting the manuscript.