Edited by: Sebastian Kelle, Deutsches Herzzentrum Berlin, Germany
Reviewed by: Anja Hennemuth, Fraunhofer Institut für Bildgestützte Medizin (MEVIS), Germany; Alberto Traverso, Maastro Clinic, Netherlands
This article was submitted to Cardiovascular Imaging, a section of the journal Frontiers in Cardiovascular Medicine
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Overview of the pipeline to evaluate test-retest repeatability of CMR radiomics features. Test-retest CMR studies are segmented to define three ROIs for radiomics analysis: LV blood pool, RV blood pool, and LV myocardium. Shape features are analyzed for all three ROIs. Additionally, first-order and texture features are extracted from the LV myocardium. Statistical analysis is performed to assess repeatability performance of radiomics features. CMR, cardiac magnetic resonance; GLCM, gray level co-occurrence matrix; GLDM, gray level dependence matrix; GLRLM, gray level run length matrix; GLSZM, gray level size zone matrix; NGTDM, neighboring gray tone difference matrix; ROI, region of interest.
Radiomics is an image analysis technique whereby a large number of advanced quantitative features are extracted from voxel level data of routine-care medical images (
Radiomics features comprise (1) shape and (2) signal intensity-based features (Graphical abstract). Shape features include geometric quantifiers of the rendered volume, such as total volume, surface area, and descriptors of overall shape, such as sphericity, elongation, and compactness. Intensity-based radiomics features describe the global distribution (first-order features) and pattern (texture features) of voxel signal intensities. First-order features describe the distribution of signal intensities of individual voxels, without consideration to spatial relationships. They are derived from histogram-based methods and summarize the intensity levels in the defined region of interest (ROI) into single quantifiers such as mean, median, maximum, randomness (entropy), skewness (asymmetry), and kurtosis (flatness). Texture features are statistical descriptors of the relationships between neighboring voxels of similar (or different) signal intensities. They are calculated using various matrix analysis methods according to standardized mathematical definitions.
The clinical utility of radiomics models for diagnosis, surveillance, and prognostication has been repeatedly demonstrated within the context of oncology (
Translation of CMR radiomics to clinical practice requires external validity of proposed models. A key determinant of model performance in clinical and pre-clinical settings is repeatability, that is, the ability to repeatedly measure the same feature under identical or near-identical conditions on the same measurement unit (subject/phantom). CMR radiomics features are subject to technical (image acquisition, artifact, image processing) and population-related variations. However, their repeatability performance has not been adequately assessed in existing work. Such analysis is an essential step in assessing the clinical utility of this methodology, both for the underpinning research and the eventual clinical implementation.
We present, to the best of our knowledge, the first evaluation of the repeatability of CMR radiomics features on test-retest scanning using a multi-centre multi-vendor dataset with a varied case-mix. This paper is intended as a reference for future researchers to guide selection of the most robust features for inclusion in CMR radiomics models.
The design, terminology, and statistical methods reflect recommendations from the Quantitative Imaging Biomarker Alliance (QIBA) (
We analyzed a subset of studies from the VOLUMES resource (
Two vendors (Philips, Siemens), three models (Achieva, Avanto, Aera), and two magnet strengths (1.5 Tesla, 3 Tesla) were used. Scanning protocols across all contributing centres were in accordance with international recommendations (
Image segmentation was performed blind to details of image acquisition, patient information, diagnosis, or scan pairings. LV endocardial and epicardial and RV endocardial contours were drawn in end-diastole and end-systole on short-axis stack images to select three ROIs for radiomics analysis: RV blood pool, LV blood pool, and LV myocardium. The blood pool ROIs reflect LV and RV cavities in end-diastole and end-systole. Segmentation was performed according to a pre-defined standard operating procedure (SOP) (
Radiomics feature extraction was performed blind to details of image acquisition, patient information, diagnosis, or scan pairings. Contours from the image segmentation were used to create 3D image masks for the three ROIs in end-diastole and end-systole (
Definition of the LV/RV blood pool and the LV myocardium for radiomics analysis. From left to right: 2D short axis mid-ventricular slice; segmentation of the three regions of interest shown overlaid on the image: LV myocardium (blue), LV blood pool (light blue), and RV blood pool (green); 3D reconstructions of the segmented ROIs. Please note, that radiomics analysis has been performed in 3D; 2D slices are provided for visualization purposes only. CMR: cardiac magnetic resonance; LV: left ventricle; ROI: region of interest; RV: right ventricle.
Radiomics features were extracted from the 3D CMR images and the corresponding 3D mask (i.e., the full 3D CMR and mask volumes) using the open-source python-based PyRadiomics platform version 2.2.0 in end-diastole and end-systole. No pre-processing or re-segmentation was used before computing the features. We considered all features available in Pyradiomics including older versions in an effort to provide robustness insights for features, that although currently considered deprecated, were largely used in the past.
Overall, 16 shape, 19 first-order, and 73 texture features were available, we applied all feature categories to the LV myocardium, and shape features to the LV and RV blood pool ROIs. For gray value discretisation, we used a fixed bin width of 25 intensity values. The texture features were extracted using five different matrices: gray-level co-occurrence matrix (GLCM, 23 features), gray-level run-length matrix (GLRLM, 16 features), gray-level size-zone matrix (GLSZM, 15 features), neighboring gray tone difference matrix (NGTDM, 5 features), and gray-level dependence matrix (GLDM, 14 features). In total, 280 features across the three ROIs, two phases, and three radiomics categories (shape, first-order, texture) were calculated per study.
We considered intra-class correlation coefficient (ICC) as a valid aggregate summary of repeatability performance in this setting. For calculation of ICC, we used a one-way random effects model for absolute agreement based on a single measure; as the two time points (test, retest) can be considered interchangeable, the one-way model is valid and appropriate for our analysis (
The sample included 54 paired test-retest CMR scans of 40 men and 14 women with mean (standard deviation) age of 51.9 (±16.8) years. Nine subjects were healthy volunteers. The remainder had a range of ischaemic and non-ischaemic cardiovascular conditions (
Characteristic of the study population.
Age (mean ±standard deviation) | 51.9 (±16.8) years |
Sex (Men: |
40 (74%) |
Healthy volunteer | 9 |
Myocardial infarction (chronic) | 14 |
Dilated cardiomyopathy | 5 |
Hypertrophic cardiomyopathy | 15 |
Left ventricular hypertrophy | 4 |
Cardio-oncology | 7 |
Siemens, Aera, 1.5 Tesla | 23 |
Siemens Avanto, 1.5 Tesla | 28 |
Philips Achieva, 3 Tesla | 3 |
We first studied the repeatability of conventional CMR indices to assess possible loss of robustness associated with the segmentation process. We calculated ICC, CV, and mean relative difference for LV end-diastolic volume, LV end-systolic volume, LV ejection fraction, LV mass, RV end-diastolic volume, RV end-systolic volume, and RV ejection fraction (
Repeatability of LV blood pool shape features varied from moderate to excellent with mean ICC ranging from 0.511 to 0.974 [Median (IQR): 0.871 (0.175)] (
Repeatability of left ventricular blood pool shape features in end-diastole.
Volume | Excellent | 0.957 (0.927, 0.975) | 5.35 | 5.58 |
Least axis length | Excellent | 0.950 (0.916, 0.971) | 2.39 | 2.51 |
Minor axis length | Good | 0.879 (0.800, 0.928) | 3.35 | 2.93 |
Surface area | Good | 0.876 (0.796, 0.926) | 5.77 | 5.75 |
Surface area to volume ratio | Good | 0.869 (0.785, 0.921) | 3.46 | 3.5 |
Maximum 2D diameter (slice) | Good | 0.844 (0.747, 0.906) | 4.15 | 4.29 |
Maximum 2D diameter (column) | Good | 0.777 (0.646, 0.864) | 4.34 | 4.96 |
Elongation | Good | 0.775 (0.642, 0.863) | 5.7 | 5.94 |
Major axis length | Good | 0.764 (0.626, 0.856) | 4.72 | 4.75 |
Flatness | Moderate | 0.747 (0.602, 0.845) | 5.9 | 6.06 |
Maximum 2D diameter (row) | Moderate | 0.746 (0.601, 0.844) | 4.95 | 5.3 |
Maximum 3D diameter | Moderate | 0.698 (0.532, 0.813) | 5.19 | 5.64 |
Compactness 2 | Moderate | 0.575 (0.367, 0.729) | 10.55 | 9.39 |
Compactness | Moderate | 0.554 (0.339, 0.714) | 5.34 | 4.72 |
Sphericity | Moderate | 0.546 (0.329, 0.708) | 3.57 | 3.15 |
Spherical disproportion | Moderate | 0.511 (0.285, 0.683) | 3.57 | 3.15 |
Bland-Altman plots for selected LV blood pool shape features in end-diastole (left) and end-systole (right) with different levels of repeatability. Differences in Bland-Altman are calculated after normalizing radiomics in the range [0–1] to facilitate comparison among different features. All features are unitless. LV: left ventricle.
Repeatability of radiomics shape features for the LV blood pool
Repeatability of RV blood pool shape features varied from moderate to excellent with mean ICC ranging from 0.556 to 0.941 [Median (IQR): 0.793 (0.158)] (
Repeatability of right ventricular blood pool shape features in end-diastole.
Minor axis length | Excellent | 0.915 (0.858, 0.950) | 4.52 | 4.87 |
Surface area | Good | 0.899 (0.832, 0.940) | 7.38 | 7.57 |
Volume | Good | 0.894 (0.825, 0.937) | 11.03 | 11.52 |
Least axis length | Good | 0.841 (0.741, 0.904) | 4.34 | 4.6 |
Maximum 2D diameter (slice) | Good | 0.837 (0.736, 0.902) | 4.36 | 4.26 |
Surface area to volume ratio | Good | 0.816 (0.704, 0.889) | 5.45 | 5.96 |
Flatness | Good | 0.800 (0.679, 0.878) | 5.55 | 6.04 |
Maximum 3D diameter | Good | 0.795 (0.672, 0.876) | 5.33 | 5.69 |
Major axis length | Good | 0.791 (0.666, 0.873) | 4.98 | 5.02 |
Maximum 2D diameter (row) | Good | 0.790 (0.665, 0.873) | 5.91 | 6.5 |
Maximum 2D diameter (column) | Good | 0.772 (0.638, 0.861) | 6.8 | 7.42 |
Elongation | Moderate | 0.749 (0.604, 0.846) | 6.22 | 6.73 |
Compactness | Moderate | 0.679 (0.506, 0.800) | 4.78 | 5.35 |
Compactness 2 | Moderate | 0.679 (0.506, 0.800) | 9.52 | 10.67 |
Sphericity | Moderate | 0.679 (0.505, 0.800) | 3.19 | 3.57 |
Spherical disproportion | Moderate | 0.672 (0.496, 0.795) | 3.19 | 3.57 |
Bland-Altman plots for selected RV blood pool shape features in end-diastole (left) and end-systole (right) with different levels of repeatability. Differences in Bland-Altman are calculated after normalizing radiomics in the range [0–1] to facilitate comparison among different features. All features are unitless. RV: right ventricle.
Repeatability of LV myocardium shape features varied from moderate to excellent with mean ICC ranging from 0.544 and 0.96 [Median (IQR): 0.839 (0.172)] (
Repeatability of left ventricular myocardium shape features in end-diastole.
Volume | Excellent | 0.946 (0.909, 0.968) | 7.34 | 8.6 |
Minor axis length | Excellent | 0.944 (0.905, 0.967) | 2.27 | 2.53 |
Least axis length | Excellent | 0.934 (0.890, 0.961) | 2.62 | 2.7 |
Maximum 2D diameter (slice) | Excellent | 0.913 (0.855, 0.948) | 2.88 | 2.9 |
Surface area | Excellent | 0.909 (0.849, 0.946) | 5.23 | 5.79 |
Surface area to volume ratio | Good | 0.837 (0.735, 0.902) | 7.03 | 7.89 |
Maximum 2D diameter (column) | Good | 0.779 (0.649, 0.866) | 4.09 | 4.76 |
Compactness 2 | Good | 0.761 (0.622, 0.854) | 15.91 | 17.81 |
Compactness | Good | 0.757 (0.616, 0.851) | 8.06 | 8.97 |
Sphericity | Good | 0.753 (0.610, 0.848) | 5.39 | 5.99 |
Maximum 2D diameter (row) | Moderate | 0.739 (0.590, 0.839) | 4.88 | 5.23 |
Spherical disproportion | Moderate | 0.724 (0.569, 0.830) | 5.39 | 5.99 |
Major axis length | Moderate | 0.717 (0.559, 0.825) | 5.06 | 5.27 |
Elongation | Moderate | 0.693 (0.525, 0.809) | 5.44 | 5.38 |
Maximum 3D diameter | Moderate | 0.677 (0.503, 0.799) | 5.16 | 5.61 |
Flatness | Moderate | 0.544 (0.327, 0.707) | 6.45 | 6.25 |
Bland-Altman plots for selected LV myocardium shape features in end-diastole (left) and end-systole (right) with different levels of repeatability. Differences in Bland-Altman are calculated after normalizing radiomics in the range [0–1] to facilitate comparison among different features. All features are unitless. LV: left ventricle.
Across all three regions of interest and the two phases, “volume” and “surface area” followed by measures of the heart short axis, i.e., “least axis length” and “minor axis length,” showed the highest average repeatability (
Repeatability of LV myocardium first-order features varied from poor to excellent with mean ICC ranging from 0.333 to 0.964 [Median (IQR): 0.932 (0.140)] (
Repeatability of left ventricular myocardium first-order features in end-diastole.
Entropy | Excellent | 0.962 (0.936, 0.978) | 8.9 | 9.7 |
90th percentile | Excellent | 0.961 (0.934, 0.977) | 11.9 | 11.8 |
Root mean squared | Excellent | 0.959 (0.930, 0.976) | 11.9 | 11.4 |
Median | Excellent | 0.958 (0.928, 0.975) | 12.4 | 11.9 |
Mean | Excellent | 0.957 (0.927, 0.975) | 12.1 | 11.5 |
Energy | Excellent | 0.950 (0.915, 0.970) | 25.2 | 27.1 |
Uniformity | Excellent | 0.942 (0.902, 0.966) | 13.0 | 14.0 |
Mean absolute deviation | Excellent | 0.934 (0.890, 0.961) | 15.1 | 16.3 |
10th percentile | Excellent | 0.933 (0.888, 0.961) | 15.0 | 15.0 |
Robust mean absolute deviation | Excellent | 0.932 (0.885, 0.960) | 15.5 | 16.5 |
Interquartile range | Excellent | 0.929 (0.881, 0.958) | 15.4 | 15.9 |
Standard deviation | Excellent | 0.918 (0.864, 0.952) | 15.8 | 17.3 |
Total energy | Excellent | 0.912 (0.853, 0.948) | 26.0 | 28.0 |
Maximum | Good | 0.875 (0.794, 0.925) | 19.1 | 21.0 |
Range | Good | 0.810 (0.694, 0.885) | 20.8 | 23.4 |
Variance | Good | 0.802 (0.683, 0.880) | 30.4 | 33.7 |
Skewness | Poor | 0.434 (0.192, 0.627) | 187.5 | 72.7 |
Minimum | Poor | 0.401 (0.154, 0.602) | 62.1 | 65.9 |
Kurtosis | Poor | 0.369 (0.116, 0.577) | 39.3 | 41.5 |
Bland-Altman plots for selected LV myocardium first-order features in end-diastole (left) and end-systole (right) with different levels of repeatability. Differences in Bland-Altman are calculated after normalizing radiomics in the range [0–1] to facilitate comparison among different features. All features are unitless. LV: left ventricle.
Repeatability of LV myocardium radiomics first-order
Repeatability of LV myocardium texture features varied from poor to excellent with mean ICC ranging from −0.130 to 0.977 [Median (IQR): 0.907 (0.006)] (
Bland-Altman plots for selected LV myocardium texture features in end-diastole (left) and end-systole (right) with different levels of repeatability. Differences in Bland-Altman are calculated after normalizing radiomics in the range [0–1] to facilitate comparison among different features. All features are unitless. LV: left ventricle.
The 10 most and 10 least robust left ventricular myocardium texture features in end-diastole.
Inverse difference moment | Excellent | 0.975 (0.957, 0.985) | 6.94 | 6.48 |
Inverse difference | Excellent | 0.973 (0.955, 0.984) | 5.05 | 4.82 |
Joint entropy | Excellent | 0.973 (0.953, 0.984) | 7.79 | 7.24 |
Run length non uniformity normalized | Excellent | 0.970 (0.949, 0.983) | 4.45 | 4.10 |
Short run emphasis | Excellent | 0.970 (0.948, 0.982) | 2.18 | 1.99 |
Difference entropy | Excellent | 0.965 (0.940, 0.979) | 7.48 | 7.54 |
Run percentage | Excellent | 0.963 (0.938, 0.979) | 3.84 | 3.17 |
Small dependence emphasis | Excellent | 0.960 (0.933, 0.977) | 11.69 | 11.87 |
Sum entropy | Excellent | 0.959 (0.931, 0.976) | 7.22 | 6.77 |
Sum average | Excellent | 0.958 (0.930, 0.976) | 11.03 | 11.7 |
Gray level variance | Good | 0.792 (0.668, 0.874) | 28.66 | 31.84 |
Informal measure of correlation 2 | Good | 0.755 (0.612, 0.850) | 11.91 | 12.33 |
Complexity | Moderate | 0.744 (0.597, 0.843) | 38.65 | 42.09 |
Inverse difference normalized | Moderate | 0.720 (0.563, 0.827) | 0.72 | 0.80 |
Strength | Moderate | 0.717 (0.559, 0.825) | 40.74 | 47.21 |
Informal measure of correlation 1 | Moderate | 0.695 (0.528, 0.811) | 20.64 | 21.63 |
Inverse difference moment normalized | Moderate | 0.676 (0.502, 0.798) | 0.23 | 0.24 |
Correlation | Moderate | 0.562 (0.350, 0.720) | 19.12 | 20.66 |
Cluster shade | Poor | 0.420 (0.175, 0.616) | 204.88 | 74.52 |
Cluster prominence | Poor | 0.364 (0.110, 0.573) | 60.66 | 69.95 |
We also evaluated differences in the reproducibility of features by texture class i.e., GLCM, GLRLM, GLSZM, NGTDM, and GLDM (
In this heterogenous case mix of test-retest studies, we demonstrated wide variation in the repeatability of CMR radiomics features by ROI, feature category and cardiac phase. There were features with good and excellent repeatability within all feature categories and ROIs. The signal intensity-based features (first-order, texture) demonstrated the greatest variation in repeatability comprising a large proportion of highly reproducible features alongside features with the poorest repeatability. We present details of repeatability performance for a comprehensive range of radiomics features, which is intended to guide selection of the most robust features for clinical modeling by future researchers. Therefore, this work is an important step in characterizing the technical performance of CMR radiomics and enhancing future efforts to evaluate its clinical utility.
There have been recent efforts to define the repeatability of radiomics features relating to oncological imaging with test-retest studies (
Jang et al. (
Our study is the first to report repeatability of LV and RV CMR radiomics shape features. Radiomics shape features are calculated from 3D image masks derived from image contours, as such, their repeatability is a direct reflection of segmentation robustness. For instance, we demonstrate better repeatability of features quantifying the heart short axis, e.g., “least axis length,” “minor axis length” and “maximal 2D diameter,” than those quantifying the long axis, e.g., “major axis length” and “maximum 3D diameter.” The reduced reproducibility of features along the cardiac long axis likely reflects segmentation robustness which is likely to suffer more at the apex and base of the heart rather than in the middle slices. This is consistent with our observation of low repeatability of all features quantifying ventricular sphericity.
Signal intensity-based features (first-order, texture) applied to the LV myocardium reflect both segmentation and signal intensities within the defined ROI. These features are therefore sensitive to variations in image acquisition which affect intensity levels within the whole image. Furthermore, there is potential to introduce extreme outlier values in the segmentation process. For instance, an LV endocardial contour that is not perfectly opposed to the endocardium would introduce a series of high value voxels from the blood pool into what will be defined as “myocardium” for radiomics analysis (
This study presents an important first step in evaluating the technical performance of CMR radiomics first-order, texture, and shape feature. The present dataset does not permit consideration of the wide range of technical and population related factors that may be modifying the repeatability performance of radiomics features. Studies considering the impact of factors such as scanner vendor/model, magnet strength, acquisition parameters, and disease are warranted. To guide building of radiomics models that would truly translate to clinical practice, we should consider robustness of features not only under repeatability, but also under reproducibility conditions, where real-life variations in scanner, operator, and image acquisition are not strictly controlled. Finally, different technical approaches to feature extraction and image normalization may improve robustness of radiomics features, in particular for intensity-based features. For example, different approaches to gray level discretisation have been shown to affect feature robustness (
There is variation in the repeatability of CMR radiomics features, which is likely to be clinically relevant. In this paper we present repeatability performance of a comprehensive range of commonly used CMR radiomics features. The work is intended to guide future researchers to select the most robust radiomics features for clinical modeling. Further work in larger and richer datasets and experimentation with different technical approaches is needed to further define the repeatability and reproducibility of CMR radiomics and to ascertain the optimal technical approach for radiomics analysis for maintaining feature robustness.
Publicly available datasets were analyzed in this study. This data can be found here:
ZR-E, SEP, KL, NCH, and PBM conceived the study. ZR-E and PG wrote the manuscript. ZR-E and SEP analyzed the CMR scans. JC supervised and advised on the statistical analysis. PG extracted radiomics features and conducted the statistical analysis. AJ contributed to manuscript editing and statistical analysis. JA, ANB, RHD, CHM, and JCM collated the studies in the VOLUMES resource. All authors provided critical review of the manuscript.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The Supplementary Material for this article can be found online at: