Radiomics-Based Classification of Left Ventricular Non-compaction, Hypertrophic Cardiomyopathy, and Dilated Cardiomyopathy in Cardiovascular Magnetic Resonance

Left Ventricular (LV) Non-compaction (LVNC), Hypertrophic Cardiomyopathy (HCM), and Dilated Cardiomyopathy (DCM) share morphological and functional traits that increase the diagnosis complexity. Additional clinical information, besides imaging data such as cardiovascular magnetic resonance (CMR), is usually required to reach a definitive diagnosis, including electrocardiography (ECG), family history, and genetics. Alternatively, indices of hypertrabeculation have been introduced, but they require tedious and time-consuming delineations of the trabeculae on the CMR images. In this paper, we propose a radiomics approach to automatically encode differences in the underlying shape, gray-scale and textural information in the myocardium and its trabeculae, which may enhance the capacity to differentiate between these overlapping conditions. A total of 118 subjects, including 35 patients with LVNC, 25 with HCM, 37 with DCM, as well as 21 healthy volunteers (NOR), underwent CMR imaging. A comprehensive radiomics characterization was applied to LV short-axis images to quantify shape, first-order, co-occurrence matrix, run-length matrix, and local binary patterns. Conventional CMR indices (LV volumes, mass, wall thickness, LV ejection fraction—LVEF—), as well as hypertrabeculation indices by Petersen and Jacquier, were also analyzed. State-of-the-art Machine Learning (ML) models (one-vs.-rest Support Vector Machine—SVM—, Logistic Regression—LR—, and Random Forest Classifier—RF—) were used for one-vs.-rest classification tasks. The use of radiomics models for the automated diagnosis of LVNC, HCM, and DCM resulted in excellent one-vs.-rest ROC-AUC values of 0.95 while generating these results without the need for the delineation of the trabeculae. First-order and texture features resulted to be among the most discriminative features in the obtained radiomics signatures, indicating their added value for quantifying relevant tissue patterns in cardiomyopathy differential diagnosis.


INTRODUCTION
Cardiomyopathies (CMs) are defined as primary myocardial disorders in the absence of other conditions that may affect the structural or functional properties of the heart's muscle (1). CMs are divided into distinct morphologic phenotypes (1), including hypertrophic cardiomyopathy (HCM) and dilated cardiomyopathy (DCM) as two of the most prevalent CMs. HCM is characterized by an increase in left ventricular (LV) wall thickness (2), DCM by LV (or biventricular) systolic dysfunction and dilatation (3), while both disorders are unexplained by loading conditions. Left ventricular non-compaction (LVNC) is a recently defined and poorly understood condition, characterized by prominent LV trabeculae, a thin compacted myocardial layer, and deep inter-trabeculae recesses (4).
Cardiac magnetic resonance (CMR) is current the gold standard imaging modality for the clinical assessment of CMs, as well as to identify and differentiate the different phenotypes. CMR is widely used in the diagnosis of HCM (5), DCM (6), and LVNC (7)(8)(9). However, some LVNC features can overlap with those of other CMs and LVNC patients might present with morphological findings of HCM and/or DCM (10). Furthermore, hypertrabeculation may also occur in the healthy population, which makes it challenging to differentiate physiologic from pathological hyper-trabeculation forms (11) by CMR. The difficulties to differentially and timely diagnose LVNC in clinical practice has motivated the development of new imaging indices, in particular, the Petersen (7) and Jacquier (8) coefficients, which estimate the level of hypertrabeculation in the LV myocardium. However, these coefficients, while they improve LVNC diagnosis (9), are challenging and tedious to estimate in practice, as they require expert and accurate identification and delineation of the trabeculae on the CMR images. This is a timeconsuming task that is furthermore subject to inter-observer variability given the inherent complexity of the trabeculae.
Radiomics is an emerging image analysis technique for deeper phenotyping of cardiovascular health and disease in CMR (12). It enables the examination of a large pool of advanced imaging features that describe a wide range of complex, as well as subtle traits of the cardiac tissues at different scales and locations. Compared to existing cardiac indices such as those listed above, radiomics features encode multivariate information by capturing and combining heterogeneous morphological (e.g., sphericity, compactness) and appearance (e.g., entropy, coarseness) properties of the tissues. Hence, in the last years, several works have shown its potential for identifying new imaging signatures that can be leveraged for enhanced cardiac disease understanding (13,14) and quantification (15)(16)(17). In addition to providing comprehensive indicators of cardiac health and disease, CMR radiomics features are easier to calculate as they only require the segmentation of the myocardial boundaries, and even this segmentation process can be automatized (18).
This work is the first to develop and evaluate a radiomics model for automatically differentiating LVNC, DCM, and HCM phenotypes in CMR. Based on a clinical dataset comprising different CM subgroups as well as healthy subjects from routine clinical practice, a machine learning pipeline is implemented to  Tables 2, 3 for more detailed information).

CMR Clinical Indices
All patients underwent a standard CMR protocol. In brief, all scans were performed with a 1.5 Tesla scanner (Avanto, Siemens Healthcare, Erlangen, Germany), with typical cine parameters as follows: TR/TE (repetition time/echo time) = 3.2/1.5 ms, voxel size 1.4 × 1.4 × 8 mm, and a slice gap of 2.0 mm. The temporal resolution was interpolated to 25 phases per cardiac cycle (28-37 ms). The protocol includes a complete cine shortaxis ventricular stack with the base to apex coverage acquired using balanced steady-state free procession (bSSFP) with one breath-hold per image slice. Short axis cine images were obtained and analyzed. Semi-automatic contouring of LV endocardial and epicardial end-diastolic (ED) and end-systolic (ES) borders was  performed with Circle 42 (CVi 42) software (Calgary, Canada). A total of 9 existing CMR indices were quantified, including LV ejection fraction (LVEF), end-diastolic and end-systolic LV volumes (EDLVV, ESLVV), LV mass, inter-ventricular septum (IVS), posterior wall (PW) thickness, asymmetry (IVS/PW), and Petersen and Jacquier coefficients (7,8). Right ventricle contours were not provided for the study; thus features were not considered. However, these diseases are predominantly related to the LV cavity, therefore missing information from the right ventricle was not considered relevant. Fractal dimensions (9) were not included in the experiments due to its limited applicability in daily clinical routine and its lack of prognostic correlation (19). Additionally, Petersen and Jacquier coefficients (7-9) were considered more validated for this experiments. All CMR analyses were performed by an expert cardiologist with several years of experience in the field. . We perform a radiomics features characterization by considering both the ED and ES phases to be able to identify disease-specific changes over a heartbeat cycle of the radiomics features. Shape radiomics features quantify and measure the morphological traits in segmented contours, independently of the gray-level intensity distribution. These subset of features are easily interpretable, as they are closely related to surfaces and volumes calculated with existing CMR indices in clinical routine. Some of these features include simple metrics such as volume, elongation, or surface area, and more advanced metrics like sphericity. First-order radiomics features are obtained from signal intensity characteristics based on the histogram. Specifically, first-order statistics describe the distribution of pixel/voxel intensities within the image region defined by the mask through commonly used and basic metrics, regardless of the spatial relationship (21). First-order radiomics features include easily interpretable features such as median or mean, and more mathematically advanced metrics such entropy, energy, or kurtosis.

Radiomics Extraction
Finally, texture radiomics features capture subtle changes in the gray-scale pixel distribution, identifying tendencies, and neighboring gray-scale changes through advanced matrix calculations. Texture features can be grouped in: Gray Level Co-occurrence Matrix (GLCM), Gray Level Size Zone Matrix (GLSZM), Gray Level Run Length Matrix (GLRLM), Neighboring Gray Tone Difference Matrix (NGTDM), and Gray Level Dependence Matrix (GLDM) (21).
Shape features are expected to capture morphological cardiac characteristics normally associated with each disease. On the other hand, both first-order and texture features are expected to play an important role in identifying grayscale changes in the LV or LVMYO tissue, a relevant trait of trabeculations in LVNC subjects.

Machine Learning Scheme
From the previous section, a total of 420 radiomics features were extracted and were potential candidates for inclusion in the targeted radiomics model for disease classification. However, not all of these features will have predictive power, and hence feature selection will be first applied to select the most optimal features for the classification task. We separated this feature selection process into 2 steps. First, we identify those that are highly correlated, for each feature, and remove them from the radiomics set as they carry a similar predictive signal. For this purpose, we estimated Pearson correlation between all features, and those above 0.9 were considered redundant. It is well-known that radiomics are highly redundant thus, with this first step we only aimed to remove the most correlated ones, and be further reduced with a more sophisticated feature selector. The procedure resulted in a reduction from 420 radiomics features to only 120. This reduced subset is introduced in the Pipeline function from Python Sci-kit learn package [(22), version 0.24.2] that has three different steps, including the additional feature selection method mentioned above: 1. Normalization: Data normalization is required before introducing the data into the machine learning models Frontiers in Cardiovascular Medicine | www.frontiersin.org because different scales in the variables measured represent that features have different contributions to the model fitting, and this may introduce bias. We applied the MinMaxScaling function from the Sci-kit Python library (22). The StandardScaler was also considered, although no significant difference was found with a min-max scaling. 2. Feature selection: The number of features remaining was still large and had to be reduced before reaching the model  building section. For this purpose, the SelectKBest function from Python's Sci-kit learn library (22) was performed. The algorithm works by selecting the best features based on univariate statistical tests. It selects the features according to two different parameters: highest score function and number of features (k). These parameters had to be defined beforehand. The score function parameter selected was f_classif, which computes the ANOVA f -value for the sample and provides the associated p-value for each feature based on the correlation with the class label. The k parameter is defined as the number of features selected by the feature selector. Prior the analysis, we do not know the optimal number of features to be selected, therefore the k-value before the analysis had to be defined within the range of parameters from 5 to 120 (i.e., number of possible features it may select), to be later introduced in the   (22). According to a recent review (23), both SVM and RF models were the most used techniques for conventional ML for imagebased diagnosis. Additionally, prior knowledge from a recent publication (24) proved also the reliability of SVM and RF when dealing with radiomics. Finally, we decided to include Logistic Regression as one of the most-used techniques in statistical analysis for the purpose of increasing the comparison benchmark.
For evaluation, the experiment is validated in a nested CV scheme (25). We performed a 10-fold outer loop and a threefold inner loop (see Figure 1 for a more graphical description). This represents that for each fold in the outer loop, 90% of the data is kept for training and validation, while the remaining 10% will be held for testing. The same procedure was performed in the inner loop for each fold. The remaining train and validation data were split into 66% for training and 33% for validation. All the splits in our scheme were performed with Stratified Kfold sci-kit learn function (22) to keep classes balanced. The normalization and feature selection steps were performed in each fold of the nested CV scheme to avoid data leakage. This means that no knowledge of the held test set was introduced into the training stage, which could corrupt the learning process and its posterior generalization. Models' performance are dependent on the hyper-parameters selected (26). For this purpose, a Grid Search CV (22) was applied in the inner loop, thus ensuring the optimal hyper-parameters were selected. Grid Search CV (22) is an optimizer algorithm that calculates the model's performance for each combination of hyper-parameters and keeps the one that achieved the highest prediction metric, to be later tested on the testing held data (see Supplementary Table 1 in Supplementary Material for the full list of hyper-parameters used).
Paired t-test on both distributions of testing AUC performances was performed to analyze the statistical significance for each general machine learning model across CMR indices and radiomics, as well as for each differential diagnosis (i.e., identifying a single disease class from the whole cohort) and prove they were comparable. Additionally, Receiver Operating Characteristic (ROC) curves were calculated to provide a better representation of the true and false-positive rates for each differential diagnosis. Due to the architecture of our Nested CV scheme, we obtain 10 different tests AUC, one per fold (10-fold outer loop, see Figure 1, left side). This means that each of the 10 models resulted from the Grid Search CV in the inner loop might be different (i.e., the combination of hyper-parameters and the number of features selected may vary depending on different characteristics of the training and validation set). Thus, to present the most relevant features on average in a single list, we analyzed the features selected by each of the 10 models and selected for representation those features that remained constant across the 10-folds and were finally sorted by feature importance score. With this, we create a highly approximated list of the most relevant radiomics features for each classification. Since it is possible that different iterations selected a different number of features (i.e., for example, fold 1 could select 30 features and fold 2, 40 features), we analyzed the validation AUC values for each number of features (k = 10, 20, 30, 40) across all the combination of hyper-parameters to see the effect of increasing the number of features on AUC (see Figure 3). Finally, to provide a more clinical perspective, we analyzed the implications of feature type (shape, first-order, or texture), region (LVMYO, LV cavity), and phase (ED, ES) for each differential diagnosis, and linked them with the existing clinical knowledge.

Classification Performance
In the first experiment, we evaluated and compared the performance of the different machine learning models (namely RF, OVR-SVM, and LR). As it can be seen in Table 4, the highest AUC values were obtained by the RF technique, for all models. Hence, the RF technique is used as the baseline models in the remainder of the experiments. Table 5 presents the in-depth results of the RF and each differential diagnosis.
Subsequently, we performed a comparison between the AUC scores obtained by the existing CMR indices (i.e., standard model) and radiomics models for classification. As it can be seen from Table 4, the radiomics model had a comparable performance to the standard model and there was no statistically significant differences between the two models (p > 0.05). However, the radiomics model was obtained without the expert delineation of the trabeculae. In more detail, the ROC curves for the classification models are presented in Figure 2.

Radiomics Signatures
In this section, we provide more details on the contributions of the different radiomics features to the classification models. We analyzed incrementally the number of features selected for all the hyper-parameter combinations and we showed in Figure 3 that the AUC values did not increase significantly after integrating 30-40 radiomics features in the model. To further illustrate the predictive power of the radiomics features, Table 6 presents across subsections the 10 best performing radiomics for each differential diagnosis, sorted by their weighted feature importance (in percentage). Moreover, Table 7 shows the 10 best radiomics features involved for the general RF model.
By looking at Figure 4, we can observe that shape features play an important role when classifying DCM against the rest of the diseases. Alternatively, texture features have a higher impact in the classification of HCM and healthy subjects, and even a higher impact in the differential diagnoses of LVNC.
Left side of Figure 5 shows the distribution of the selected radiomics features across the cardiac structures (i.e., LVMYO vs. LV cavity). For the identification of DCM, most of the radiomics features (65%) pertained to the LV cavity, while the remaining 35% belonged to the LVMYO. Conversely, features extracted from the LV cavity and the LVMYO participated equally to the identification of HCM and healthy subjects. Finally, the largest difference in terms of region importance can be found for LVNC, where almost 96% of the features belonged to LVMYO, with only a 4% to LV cavity (see Figure 5, left image). This last finding is in line with the existing clinical knowledge. Normally, papillary muscles are considered inside the LV cavity and not as a myocardial mass, according to the guidelines and clinical consensus among cardiologists. But in patients with LVNC conditions, trabeculae and papillary muscles are quantified [within Jacquier coefficient (8)] outside the LV cavity and included in the myocardial mass (LVMYO). For this reason, contrary to what the name itself suggests, the assessment FIGURE 4 | Differential diagnosis overlapping radar plot, comparing the distribution of the selected radiomics across types of radiomics. between LVNC and the rest of cardiomyopathies is determined by changes or differences in the LVMYO.
Regarding the contribution of cardiac cycle phases, ES radiomics were more important than ED features for the classification of DCM, HCM, and healthy subjects. However, ES and ED phases play a similar role when assessing LVNC (see Figure 5, right image).
Finally, we compared the time needed to obtain the diagnoses using the standard as well as the proposed radiomics models. With the existing CMR indexes, our clinical experts spent approximately 9-12 min to delineate the trabeculae and then derive an LVNC diagnosis, on average. In contrast, for the proposed radiomics-based approach, the time to assess one subject was reduced to 10-20 s depending on the image characteristics [i.e., volume, slice images, or bin width (21)].

Summary of Findings
Machine learning-based radiomics models showed excellent performance for differentiating between hypertrophic cardiomyopathy, dilated cardiomyopathy, left ventricular non-compaction, as well as healthy subjects. According to the results presented in Table 4 and Figure 2, radiomics and existing CMR indices resulted in similar performances.
The 10-most significant radiomics features for the general RF model comprise a combination of radiomics types, regions, and heart cycle phases (see Table 7). Looking in detail, myocardium sphericity seems to play an important role in the overall classification, occupying the first and third spots in terms of predictive power for the ES and ED phases, respectively. This can be explained by the remodeling of the heart that affects the global and regional structure of the left ventricle.
Moreover, texture features were found to add substantial information, positioning in the second top position of the ranking, which underlines the widely accepted diagnostic importance of myocardial tissue characteristics such as myocardial fibrosis (see Table 7). Specifically, Long Run High Gray Level Emphasis is a texture feature that explains longer contrasted gray level strings/regions, which can be related to the existence and prominence of myocardial trabeculations, typically associated with LVNC.
Regarding the radiomics features for each differential diagnosis, differences were found among the various diseases. Concretely, we can observe how the contribution of the radiomics features to the prediction are distributed, by type (Figure 4), region and phase ( Figure 5). While shape features seem to play the most important role in DCM classification, texture features are more important when classifying HCM and LVNC subjects. This finding is in line with clinical knowledge: DCM is defined primarily by a dilation of the left ventricle, while myocardial fibrosis has a pivotal role in HCM diagnosis and LVNC is defined by the presence of myocardial trabeculations.

Limitations and Future Work
The findings presented in this paper must be considered in light of the study limitations, and future work may take different directions. Firstly, while the radiomics performance for automated diagnosis is promising, these results were obtained based on a single-center small-size clinical cohort. To confirm these promising results, future studies should be extended toward multi-center studies. Furthermore, this work relied on semi-automated, manually controlled, delineations of the LV endo and epi-cardial contours on the short-axis images, before the extraction of the existing indices and radiomics features. However, automatic segmentation of the ventricular boundaries has been extensively investigated using deep learning (18) and these models could be extended to segment pathological cases in particular for LVNC. Finally, this paper focused on the diagnosis of LVNC and related CMs. In clinical practice, subsequently, prediction of LVNC related events before they occur would enable early and personalized prevention. Our plan is to extend the proposed radiomics models to enable patient-specific risk prediction and prognosis estimation after LVNC diagnosis.

CONCLUSIONS
CMR radiomics constitutes a promising approach to differentially diagnose overlapping and complex conditions such as HCM, DCM, and LVNC. The classification performance of radiomics models are in-line with the one obtained by using existing CMR indexes but the diagnoses can be reached fully automatically without the need for expert delineation of the trabeculae as in previous works.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

ETHICS STATEMENT
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
CI, GC, AG, JR-P, PG, and KL: design of work. CI, VC, CM-I, and PG: experimental setup and coding. CI, GC, and AG: interpretation of results. CI and GC: manuscript draft. VC, CM-I, GC, AG, JR-P, and KL: manuscript review. All authors contributed to the article and approved the submitted version.