Radiomics Signatures of Cardiovascular Risk Factors in Cardiac MRI: Results From the UK Biobank

Cardiovascular magnetic resonance (CMR) radiomics is a novel technique for advanced cardiac image phenotyping by analyzing multiple quantifiers of shape and tissue texture. In this paper, we assess, in the largest sample published to date, the performance of CMR radiomics models for identifying changes in cardiac structure and tissue texture due to cardiovascular risk factors. We evaluated five risk factor groups from the first 5,065 UK Biobank participants: hypertension (n = 1,394), diabetes (n = 243), high cholesterol (n = 779), current smoker (n = 320), and previous smoker (n = 1,394). Each group was randomly matched with an equal number of healthy comparators (without known cardiovascular disease or risk factors). Radiomics analysis was applied to short axis images of the left and right ventricles at end-diastole and end-systole, yielding a total of 684 features per study. Sequential forward feature selection in combination with machine learning (ML) algorithms (support vector machine, random forest, and logistic regression) were used to build radiomics signatures for each specific risk group. We evaluated the degree of separation achieved by the identified radiomics signatures using area under curve (AUC), receiver operating characteristic (ROC), and statistical testing. Logistic regression with L1-regularization was the optimal ML model. Compared to conventional imaging indices, radiomics signatures improved the discrimination of risk factor vs. healthy subgroups as assessed by AUC [diabetes: 0.80 vs. 0.70, hypertension: 0.72 vs. 0.69, high cholesterol: 0.71 vs. 0.65, current smoker: 0.68 vs. 0.65, previous smoker: 0.63 vs. 0.60]. Furthermore, we considered clinical interpretation of risk-specific radiomics signatures. For hypertensive individuals and previous smokers, the surface area to volume ratio was smaller in the risk factor vs. healthy subjects; perhaps reflecting a pattern of global concentric hypertrophy in these conditions. In the diabetes subgroup, the most discriminatory radiomics feature was the median intensity of the myocardium at end-systole, which suggests a global alteration at the myocardial tissue level. This study confirms the feasibility and potential of CMR radiomics for deeper image phenotyping of cardiovascular health and disease. We demonstrate such analysis may have utility beyond conventional CMR metrics for improved detection and understanding of the early effects of cardiovascular risk factors on cardiac structure and tissue.

Cardiovascular magnetic resonance (CMR) radiomics is a novel technique for advanced cardiac image phenotyping by analyzing multiple quantifiers of shape and tissue texture. In this paper, we assess, in the largest sample published to date, the performance of CMR radiomics models for identifying changes in cardiac structure and tissue texture due to cardiovascular risk factors. We evaluated five risk factor groups from the first 5,065 UK Biobank participants: hypertension (n = 1,394), diabetes (n = 243), high cholesterol (n = 779), current smoker (n = 320), and previous smoker (n = 1,394). Each group was randomly matched with an equal number of healthy comparators (without known cardiovascular disease or risk factors). Radiomics analysis was applied to short axis images of the left and right ventricles at end-diastole and end-systole, yielding a total of 684 features per study. Sequential forward feature selection in combination with machine learning (ML) algorithms (support vector machine, random forest, and logistic regression) were used to build radiomics signatures for each specific risk group. We evaluated the degree of separation achieved by the identified radiomics signatures using area under curve (AUC), receiver operating characteristic (ROC), and statistical testing. Logistic regression with L1-regularization was the optimal ML model. Compared to conventional imaging indices, radiomics signatures improved the discrimination of risk factor vs. . Furthermore, we considered clinical interpretation of risk-specific radiomics signatures. For hypertensive individuals and previous smokers, the surface area to volume ratio was smaller in the risk factor vs. healthy subjects; perhaps reflecting a pattern of global concentric hypertrophy in these conditions. In the diabetes subgroup, the most discriminatory radiomics feature was the median intensity of the myocardium at end-systole, which suggests a global alteration at the myocardial tissue level.

INTRODUCTION
Cardiovascular magnetic resonance (CMR) is the reference standard for assessment of cardiac structure and function and is used widely in both research and clinical settings. Routine assessment is reliant on visual inspection of CMR images for identifying global and local abnormalities; this is both laborintensive and reader dependent (1)(2)(3)(4). Existing quantifiers, such as ejection fraction and chamber volumes, are overly simplistic and often do not capture subtle and complex changes that affect the myocardium at early disease stages (5). Current approaches are thus suboptimal for early disease detection and outcome prediction. Therefore, there is need for novel, more advanced quantitative approaches to CMR image analysis to improve clinical diagnosis and risk prediction.
CMR radiomics is a novel image quantification technique whereby pixel-level data is analyzed to derive multiple quantifiers of tissue shape and texture (6). Technological advancements and the availability of high computational power has allowed deployment of machine learning (ML) methods with radiomics features to discriminate disease or predict outcomes (7). A distinct advantage of radiomics modeling over unsupervised algorithms is the potential for explainability through identification of the most defining radiomic features in the model. It is thought that radiomics features correspond to alterations at both the morphological and tissue levels and thus, the most defining features of a particular condition (or its radiomics signature) may provide insights into its pathophysiology (8). Within oncology, where radiomics is most well-developed, the incremental value of radiomics models for diagnosis and prognosis have been widely reported (8)(9)(10)(11)(12)(13)(14) from CMR radiomics models for discrimination of important conditions such as myocarditis, hypertrophic cardiomyopathy, and ischemic heart disease (15)(16)(17)(18).
While existing works have mostly focused on image phenotyping of established cardiovascular diseases, CMR radiomics may also provide incremental information to conventional approaches for improved quantification of cardiac alterations related to cardiovascular risk factors at the subclinical stage. We thus present the largest and most comprehensive assessment of the performance of CMR radiomics for image phenotyping of important cardiovascular risk factors including diabetes, hypertension, high cholesterol, and smoking status, by using a large annotated CMR dataset from the UK Biobank (UKB).

Population and Setting
UKB is a large-scale population health resource aimed at enhancing biomedical research and ultimately improving prevention, diagnosis, and treatment of a wide range of serious and life-threatening illnesses (19). Over 500,000 participants aged 40-69 years old were recruited from around the UK between 2006 and 2010. The UK Biobank holds an exceptional amount of data including detailed lifestyle information, medical history, serum biomarkers, physical measures, and multi-modal imaging including magnetic resonance imaging of the abdomen, brain, and heart (20). Thus, UKB provides the ideal platform for assessment of the performance characteristics of novel quantitative biomarkers, such as radiomics, in discriminating common cardiovascular risk factors.

CMR Imaging Protocol
CMR cine images were acquired using a standardized UKB protocol, which is detailed in a dedicated publication (21). In brief, all scans were performed with a 1.5 Tesla scanner (MAGNETOM Area, Syngo Platform VD13A, Siemens Healthcare, Erlangen, Germany), with typical cine parameters as follows: TR/TE (repetition time/echo time) = 2.6/1.1 ms, flip angle 80 • , Grappa factor 2, voxel size 1.8 × 1.8 × 8 mm, and a slice gap of 2.0 mm. The actual temporal resolution of 32 ms was interpolated to 50 phases per cardiac cycle (∼20 ms). The protocol includes a complete cine short-axis ventricular stack with base to apex coverage acquired using balanced steady state free procession (bSSFP) with one breath-hold per image slice.

CMR Image Segmentation
CMR scans of the first 5,065 UKB participants that completed the imaging study were manually analyzed across two core laboratories (London, Oxford) using a pre-defined standard operating procedure, which is detailed elsewhere (22). In brief, left and right ventricular (LV, RV) endocardial contours and LV epicardial contours were drawn in end-systole and end-diastole on the short axis stack images using the CVI42 post-processing software (Version 5.1.1, Circle Cardiovascular Imaging Inc., Calgary, Canada). These contours were used to define three regions of interest (ROIs) for radiomics analysis: RV blood pool, LV blood pool, and LV myocardium. All acquisitions were ECG gated and thus end-diastole was defined as the first phase in the sequence. End-systole was defined as the frame with smallest LV cavity area by visual assessment detected at the mid-cavity level. Papillary muscles were considered part of the blood pool. Slices with more than 50% circumferential LV myocardium were included in LV contours. RV volume was defined as areas below the pulmonary valve plane identified by visual assessment.

Selection of Study Sample
We considered the first 5,065 UKB participants to complete CMR imaging. We excluded 174 individuals due to incomplete segmentations (having either one or more cardiac structures missing in the segmentations). From the remaining 4,891 individuals, a healthy cohort (n = 1,394) was defined by considering participants without known cardiovascular disease or risk factors. Diabetes (n = 224), hypertension (n = 1,394), and high cholesterol (n = 779) were taken from self-reported conditions. Smoking status was taken as self-report of current (n = 320) or previous (n = 1,394) tobacco smoking. Participants positive for each risk factor were compared with an equal number of randomly selected reference healthy subjects to eliminate bias in the machine learning models due to class imbalance (Figure 1).

Radiomics Analysis
The overall radiomics workflow is depicted in Figure 2. Radiomics shape and signal intensity-based features were extracted from the three segmented ROIs (LV blood pool: LV, LV myocardium: MYO, RV blood pool: RV) in end-diastole (ED) and end-systole (ES). The analysis of the radiomics features in the myocardium may enable identification of tissue-level changes due to the cardiovascular risk factors. The inclusion of the LV and RV cavities is aimed at identifying changes in the shapes of each ventricle, or in the patterns of the trabeculation and papillary muscles. Automated extraction of radiomics features was performed using the open source python-based radiomics library Pyradiomics (version 1.3.0, October 2017) 1 (23). The customization of image preprocessing and feature extraction was performed with Pyradiomics default settings, including a gray value discretization with a bin width of 25 to extract the intensitybased and texture radiomics features. In total, 684 radiomics features were extracted per study (consisting of 114 radiomics features per cardiac structure: LV, RV and MYO at two timepoints of the cardiac cycle: ED and ES).

Shape-Based Radiomics Features
16 radiomics shape features were extracted per ROI at ED and ES (see Supplementary Table). Radiomics shape features describe geometrical properties of the defined ROI, such as volume, maximal diameter, minor/major axis, surface area volume ratio, elongation, flatness, and sphericity. Radiomics shape features may provide incremental value to existing CMR indices as they include conventional shape indices (e.g., cavity volumes) as well as more advanced geometric quantifiers (e.g., sphericity, flatness). They also have the potential to define disease-specific patterns of cardiac alterations beyond those possible with existing CMR indices.

Signal Intensity-Based Radiomics Features
Signal intensity-based radiomics features may have the potential to decode variations in cardiac tissue due to abnormalities induced by disease processes. They are commonly grouped into two categories, namely first-order and texture features. Firstorder features are histogram-based statistics describing the global distribution of signal intensities within the defined ROI without consideration to their spatial relationships. These include simple measures such as the mean intensity or standard deviation, as well as more advanced measures such as skewness, uniformity or entropy (see full list in Supplementary Table).

Texture-Based Radiomics Features
In contrast, texture radiomic features allow the quantification of spatial inter-pixel relationships using more advanced matrix analysis methods (24,25). Through this, signal intensities patterns within the ROI may be numerically quantified using pre-agreed mathematical definitions. Many texture patterns may be considered to quantify characteristics such as the complexity, heterogeneity, coarseness, or repeatability of the building blocks of the tissue. The idea is that these texture features may reflect myocardial tissue characteristics which in turn reflect underlying disease processes. In this study, 19 first-order features and 79 texture features were extracted from each ROI per cardiac phase.

Identification of Optimal Radiomic Signatures
The goal of the study is to leverage feature selection and machine learning techniques to identify radiomics signatures that best describe the structural and tissue differences between risk factor (at-risk) and healthy (no-risk) groups in CMR imaging. To this end, we use the risk factors as "proxy" output variables and build multiple machine learning models by varying the combinations of input radiomic features through systematic feature selection. We obtain multiple models (and thus multiple candidate radiomic signatures) and through statistical testing one can select the best model and therefore the radiomic signature that best separate the at-risk and no-risk groups. Because these selected radiomics signatures differentiate at-risk from healthy individuals, they can be considered and analyzed as potential descriptors of the cardiac alterations due to the risk factors in question. Importantly, we use machine learning as a more advanced means to combine multiple radiomic features into risk-specific signatures, while taking into account non-linear complementarities between the parameters.
For feature selection, we used the sequential forward feature selection (SFFS) method as it has demonstrated good performance in previous CMR radiomics studies (15,26). The termination criterion was set to 2% in all experiments following literature standards, i.e., the process was stopped if an added feature did not increase model performance beyond the termination criterion. To obtain more robust estimates and improve generalizability, ten-fold cross-validation was used in the feature selection process, rotating training and validation folds (80 and 20% of the dataset, respectively). We combined SFFS with classical ML algorithms [support vector machines (SVM), random forests (RF), logistic regression (LR)] to identify the combination of radiomics features that best define each studied cardiovascular risks/subgroups. For each ML method, hyperparameter optimization was performed to enhance the discrimination between no-risk and at-risk subgroups (Supplementary Material). Implementation of the SFFS and the ML techniques was based on the mlxtend (version 0.17.0) (27) and scikit-learn (version 0.20.3) (28) python-based libraries, respectively.
The selected radiomics features resulting from the SFFS algorithm and ML techniques were combined to create the radiomics signature that best encode the changes in CMR induced by the different cardiovascular risk factors. To quantify the added value of the proposed radiomics approach, we built similar ML models/risk signatures using conventional CMR indices as input variables. Note that all radiomics features and cardiac indices were normalized (to a mean of zero and standard deviation of one) to ensure they are equally weighted in all analyses. Note that individuals with multiple risk factors were not excluded. In the machine learning models, we set the outcome to each risk factor individually, which enabled the identification of the radiomics signatures specific to that risk factor.
In this work, we assess model performance (i.e., the ability of the radiomics signatures to discriminate at-risk vs. no-risk subjects) using receiver operating characteristic (ROC) curve and area under the curve (AUC) scores. We also report model accuracy, defined as number of correctly discriminated no-risk vs. at-risk cases based on the radiomics signatures, divided by the total number of cases. Additionally, statistical tests were performed to assess the statistical significance of the differences between the various ML models, by using the McNemar's test for pairwise comparisons, as well as the Cochran's Q test, which is an extension of the McNemar's test for the comparison of more than two models (29, 30).

Summary of Subgroups and Conventional CMR Indices
The subjects included in the analysis are summarized in Table 1. Across all risk factor groups there was higher proportion of male participants (between 52.3 and 60.1% depending on the risk factor), whereas in the healthy cohort, there were fewer men (42.5%). Average age across the risk groups was between 59 (±8) and 65 (±6) years, while it was equal to 60 (±7) years for the healthy cohort. As expected, there were differences in conventional CMR between the at-risk subgroups and healthy subjects. In particular, all risk groups had on average greater indexed left ventricle mass (LVMi) in comparison to the healthy cohort with the greatest difference in the hypertensive group (50.3 g/m 2 vs. 46.3 g/m 2 ). All risk factor groups had lower indexed left ventricle stroke volume (LVSVi) and indexed right ventricle stroke volume (RVSVi) in comparison to the healthy cohort. There were also variations in chamber volumes, with different directions of difference depending on the risk category. Finally, it is worth noting that no statistically significant differences (Welch's t-test) in the conventional indices were found between the healthy and each at-risk subgroups, except for LVEF in diabetes and LVSVi values in hypertension and current smokers (see Table 1).

Radiomics Signatures Have Superior Discriminatory Performance Over Conventional CMR Indices
In comparison to conventional indices, radiomics signatures provided better discrimination between healthy and at-risk subjects for diabetes (0.80 AUC for radiomics vs. 0.70 for conventional indices), hypertension (0.72 vs. 0.69), high cholesterol (0.71 vs. 0.65), and previous smokers (0.63 vs. 0.60) (Figure 3). The obtained models with radiomics vs. conventional indices were also compared using the McNemar's test; the differences were found to be statistically significant for diabetes, hypertension, high cholesterol, and previous smokers but not for current smokers.

Comparison of the Degree of Discrimination Achieved for Each Subgroup
The degree of discrimination (no-risk vs. at-risk hearts) achieved using radiomics models varied between the different cardiovascular risks, as these have different effects on the heart. The highest degree of discrimination with radiomics models was seen in diabetes (0.78), suggesting that radiomics features are particularly important in distinguishing diabetes-related cardiac changes. The smallest degree of separation was seen in previous smokers (0.61). High cholesterol, hypertension and current smokers achieved similar degrees of separation by the radiomics models (i.e., 0.68, 0.68, and 0.67, respectively).

The Identified Radiomics Signatures for Each Cardiovascular Risk Factor
The identified radiomics signatures for each risk factor are described in Table 2. Overall, there was a more prominent role for shape and texture features than first-order features. For instance, in diabetics, five of the eleven features included in the model were shape-based and in the hypertension group, no firstorder feature was selected. As expected, radiomics features from the LV blood pool and LV myocardium were the most relevant regions, with the RV blood pool having a minor role for the risk factors studied in this paper.
In Table 3, we consider the most discriminative radiomics feature for each risk factor, i.e., the feature assigned the most importance in the model, and compare it with the most discriminative conventional CMR measure, which was LVM for all risk groups.
For all the subgroups, the mean value of the most important radiomics features and conventional CMR indices was significantly different in the risk factor vs. healthy cohorts (p < 0.001, Table 3). In addition, the single best radiomics feature outperformed the conventional CMR indices in its relevance for all risk factors. However, it was the combination of several radiomics features into a radiomic signature ( Table 4) that provided the best overall discriminative power.

Summary of Findings
This paper described a methodology based on radiomics, machine learning and feature selection to discover new discriminatory signatures in CMR. Based on over 5,000 datasets, we presented the largest and most comprehensive study to demonstrate the feasibility and performance of CMR radiomics

Clinical Interpretation of the Radiomics Signatures
A distinct advantage of radiomics modeling over black-box techniques such as deep learning is the potential interpretability of the obtained results. Therefore, we can attempt to reason the prominence of certain radiomics features in disease discrimination models. Shape features were highly featured in all models and indicate subtle patterns of ventricular remodeling that are specific to conditions under study. For instance, spherical disproportion (i.e., the inverse of sphericity) of the myocardium at end-diastole was lower in participants with high cholesterol compared with healthy individuals, indicating that the overall  shape of the LV is less elliptical and more spherical in this risk factor group. For hypertensive individuals and previous smokers, the surface area to volume ratio was smaller in the risk subgroups vs. healthy subjects; this may reflect a pattern of concentric LV hypertrophy in these conditions. For certain risk factors, intensity/texture features seemed more important, such as median intensity for diabetes. As this was a retrospective study, we can only speculate as to the cause of this association. One hypothesis is that diabetes leads to a global alteration of the myocardial tissue and thus of the overall myocardial appearance in CMR images, resulting in higher median intensities compared to non-diabetic subgroups. However, testing this hypothesis is beyond the scope of this study.
As another example of a prominent textural feature, the most important feature identified for current smokers in this study was gray level non uniformity. In a previous study (31), the same radiomic feature was identified as the most important radiomic feature in hypertrophic cardiomyopathy (HCM). However, as the authors pointed out in their paper, the intensity heterogeneity of myocardial tissue is not unique to HCM and it might be of importance for other conditions. As smoking is a well-known cause for such cardiovascular diseases (32), there may be some commonality in the patterns of myocardial hypertrophy and tissue fibrosis in these conditions that is being reflected in the observed texture features. Indeed, the increased heterogeneity in gray level intensities for current smokers as found in our study supports the potential effects on the myocardium for these subjects.
Thus, radiomics allows more granular distinctions between health and disease in comparison to conventional CMR indices where, rather crudely, the single most discriminatory feature for all risk factors was higher LVM. These findings indicate the potential clinical utility of radiomics in improving understanding of the effects and pathophysiology of important cardiovascular risk factors.

Comparison With the Existing Literature
Literature in support of the superior diagnostic performance of CMR radiomics models over conventional image analysis is slowly gaining momentum. Several studies have shown the feasibility and clinical utility of CMR radiomics for distinguishing important disease entities. A small study by Baeßler et al. (31) demonstrates the superior performance of CMR radiomics in discriminating hypertrophic cardiomyopathy (n = 32) from healthy comparators (n = 30). The most discriminative feature was gray level non-uniformity, a radiomics texture feature representing heterogeneity. It seems intuitive that this feature would be defining of the irregular myofibrillar architecture of hypertrophic cardiomyopathy. Similar to our observations, in particular with diabetes, it appears that the observed radiomics signatures may reflect clinically meaningful information about significant tissue level changes. Furthermore, studies have demonstrated the ability of CMR radiomics to distinguish important conditions that appear morphologically similar with conventional image analysis. For instance, Neisius et al. (15) demonstrated high performance of CMR radiomics models applied to native T1 images to distinguish hypertensive heart disease (n = 53), hypertrophic cardiomyopathy (n = 108), and healthy volunteers (n = 71). There is also emerging work on using CMR radiomics to identify areas of myocardial infarction from non-contrast cine image (16,33,34) and to identify acute from chronic myocardial infarction (33).
Our paper constitutes the most comprehensive study to assess the relationship between CMR radiomics and cardiovascular risk factors. However, the concept of utilizing information from CMR to obtain more complex geometric information has been addressed previously using atlas-based shape measures. Cardiac atlases produce statistical shape models, giving highly detailed morphometric information (35)(36)(37). Directly comparable to our findings, Gilbert et al. (38) demonstrate unique morphometric variations associated with individual risk factors (high blood pressure, smoking, high cholesterol, diabetes, angina), which could be quantified and visualized on constructed atlases. The derivation of radiomics shape features is methodologically different from cardiac atlases, however there are conceptual similarities about the type of information they provide. Both seem to suggest that geometric features not captured by current image analysis approaches may be extracted from existing CMR images and that this information seems to provide additional insight into patterns of cardiac remodeling. CMR radiomics has several advantages over cardiac atlas models. The signal intensity based radiomics features (first-order, texture) have great potential for not only better disease discrimination and outcome prediction, but also gaining deeper insights into disease processes at the tissue level; such information is not provided by cardiac atlas morphometrics. CMR radiomics analysis does not require any dedicated acquisitions or post-processing and the extraction of radiomics features and model building are computationally simpler than atlas models. Therefore, there is real potential for radiomics to enter the clinical workflow as a very high yield and complementary image analysis tool.
Note that in this study we chose to select a different healthy subsample than in Petersen et al. (22). This is due to the differences in the objectives of the papers. While Petersen et al. (22) focused on the estimation of normal ranges of cardiac indices of structure and function and thus used very strict inclusion criteria, we are concerned with the study of cardiovascular risk factors and therefore we excluded subjects with known cardiovascular risk factor or disease.

LIMITATIONS AND FUTURE WORK
To the best of our knowledge, this is the largest study to assess the performance of CMR radiomics model in discriminating several important cardiovascular risk factors. Our findings demonstrate the feasibility of CMR radiomics models to identify cardiac changes related to important cardiovascular risk factors (diabetes, hypertension, high cholesterol, and smoking) with greater accuracy than conventional indices. The UKB provides an excellent platform for this study with a large sample of well characterized participants with linked CMR imaging. However, the data collection was conducted through a combination of a touchscreen questionnaire and a face-to-face nurse interview, and thus there remains some concerns about the accuracy and objectivity of the self-reported conditions. Studies with consideration of more sophisticated statistical methods to better account for confounding factors, as well as with inclusion of external validation cohorts, are needed to produce and validate more disease-specific and generalizable models. In particular, there is a need for prospective studies to determine the clinical utility of these models in providing incremental cardiovascular risk information.
As for the pipeline implemented in this paper, alternative approaches may merit exploration, such as testing different methods for feature selection [e.g., LASSO (39), combination of filter and wrapper-based methods (40)], or applying extensive hyper-parameter optimization for each risk group. Also, while cross-validation was performed in the feature selection process to reduce the instability of radiomics features, other strategies have been proposed such as prior clustering of redundant features (41), or using a concordance correlation coefficient (42). Additionally, there is need for proper evaluation of the reproducibility of radiomics features across segmentation protocols and also across imaging acquisitions, which is important due to non-standard pixel values and large variation in signal intensities (43). Wider use of radiomics quality scores (44) would also enable better quality and more uniform reporting of radiomics studies and foster research reproducibility. Finally, as a common problem of artificial intelligence-based radiomics approaches, we have not assessed the practical value of the present results since there is no comparative gold standard that can be used for comparison.

CONCLUSIONS
CMR radiomics is an emerging technique for deeper and more accurate cardiac phenotyping in comparison to conventional image analysis. Our preliminary results based on a large sample from the UKB indicates the feasibility of CMR radiomics analysis and potential clinical utility in superior image phenotyping of major cardiovascular risk factors, including diabetes, hypertension, high cholesterol, and smoking. The clinical value of these radiomics signatures for prediction of downstream events warrants further investigation in prospective cohorts.

DATA AVAILABILITY STATEMENT
This research was conducted using the UK Biobank re-source under Application 2964. UK Biobank will make the data available to all bona fide researchers for all types of health-related research that is in the public interest, without preferential or exclusive access for any person. All researchers will be subject to the same application process and approval criteria as specified by UK Biobank. For the detailed access procedure see http://www. ukbiobank.ac.uk/register-apply/.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by NHS National Research Ethics Service (17th June 2011, Ref11/NW/0382). The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
SKP and SEP contributed to study concepts, methods, and underlying data collection. SEP, SKP, SNe, and ZR-E provided support on clinical aspects of the study. IC, ZR-E, OC, and KL drafted the manuscript. IC, KL, SNa, OC, and MG designed the machine learning methods. IC performed the data pre-processing and data analysis. All authors contributed to the article and approved the submitted version and participated in the analysis of the data, critical revision of the manuscript, and final approval of the submitted manuscript.

FUNDING
This work was partly funded by the European Union's Horizon