DeepStrain: A Deep Learning Workflow for the Automated Characterization of Cardiac Mechanics

Myocardial strain analysis from cinematic magnetic resonance imaging (cine-MRI) data provides a more thorough characterization of cardiac mechanics than volumetric parameters such as left-ventricular ejection fraction, but sources of variation including segmentation and motion estimation have limited its wider clinical use. We designed and validated a fast, fully-automatic deep learning (DL) workflow to generate both volumetric parameters and strain measures from cine-MRI data consisting of segmentation and motion estimation convolutional neural networks. The final motion network design, loss function, and associated hyperparameters are the result of a thorough ad hoc implementation that we carefully planned specific for strain quantification, tested, and compared to other potential alternatives. The optimal configuration was trained using healthy and cardiovascular disease (CVD) subjects (n = 150). DL-based volumetric parameters were correlated (>0.98) and without significant bias relative to parameters derived from manual segmentations in 50 healthy and CVD test subjects. Compared to landmarks manually-tracked on tagging-MRI images from 15 healthy subjects, landmark deformation using DL-based motion estimates from paired cine-MRI data resulted in an end-point-error of 2.9 ± 1.5 mm. Measures of end-systolic global strain from these cine-MRI data showed no significant biases relative to a tagging-MRI reference method. On 10 healthy subjects, intraclass correlation coefficient for intra-scanner repeatability was good to excellent (>0.75) for all global measures and most polar map segments. In conclusion, we developed and evaluated the first end-to-end learning-based workflow for automated strain analysis from cine-MRI data to quantitatively characterize cardiac mechanics of healthy and CVD subjects.

Myocardial strain analysis from cinematic magnetic resonance imaging (cine-MRI) data provides a more thorough characterization of cardiac mechanics than volumetric parameters such as left-ventricular ejection fraction, but sources of variation including segmentation and motion estimation have limited its wider clinical use. We designed and validated a fast, fully-automatic deep learning (DL) workflow to generate both volumetric parameters and strain measures from cine-MRI data consisting of segmentation and motion estimation convolutional neural networks. The final motion network design, loss function, and associated hyperparameters are the result of a thorough ad hoc implementation that we carefully planned specific for strain quantification, tested, and compared to other potential alternatives. The optimal configuration was trained using healthy and cardiovascular disease (CVD) subjects (n = 150). DL-based volumetric parameters were correlated (>0.98) and without significant bias relative to parameters derived from manual segmentations in 50 healthy and CVD test subjects. Compared to landmarks manually-tracked on tagging-MRI images from 15 healthy subjects, landmark deformation using DL-based motion estimates from paired cine-MRI data resulted in an end-point-error of 2.9 ± 1.5 mm. Measures of end-systolic global strain from these cine-MRI data showed no significant biases relative to a tagging-MRI reference method. On 10 healthy subjects, intraclass correlation coefficient for intra-scanner repeatability was good to excellent (>0.75) for all global measures and most polar map segments. In conclusion, we developed and evaluated the first end-to-end learning-based workflow for automated strain analysis from cine-MRI data to quantitatively characterize cardiac mechanics of healthy and CVD subjects.
Keywords: cine-MRI, deep learning, segmentation, motion estimation, myocardial strain INTRODUCTION Cardiac mechanics reflects the precise interplay between myocardial architecture and loading conditions that is essential for sustaining the blood pumping function of the heart. The ejection fraction (EF) is often used as a left-ventricular (LV) functional index, but its value is limited when mechanical impairment occurs without an EF reduction (1). Alternatively, tissue tracking approaches for strain analysis provide a more thorough characterization through non-invasive evaluation of myocardial deformation from echocardiography or cinematic magnetic resonance imaging (cine-MRI) data (2), and could be used to identify dysfunction before EF is reduced (3). Unfortunately, various sources of discrepancies have limited the wider clinical applicability of these techniques, including factors related to imaging modality, algorithm, and operator (4). More accurate measures could be obtained from tagging-MRI data widely regarded as the reference standard for strain quantification (5,6), but use of these data is less common partly due to lack of available analysis tools, whereas echocardiography and cine-MRI data are ubiquitously acquired and analyzed in clinical practice.
Irrespective of algorithm or modality, e.g., speckle tracking for echocardiography or feature tracking for cine-MRI, the main challenge is to estimate motion within regions along the myocardial wall (2). Operator-related discrepancies are introduced when the myocardial wall borders are delineated manually, a time-consuming process that requires considerable expertise and results in significant inter-and intra-observer variability (7,8). Automatic delineation approaches have been implemented within computational pipelines (9), but other factors related to motion tracking algorithms also influence strain assessment, including the appropriate selection of tuneable parameters whose optimal values can differ between patient cohorts and acquisition protocols [e.g., the size of the search region in block-matching methods (10)]. Further, these algorithms often make assumptions about the properties of the myocardial tissue [e.g., incompressible and elastic (11,12)], or use registration methods to drive the solution toward an expected geometry. However, recent evidence has shown the validity of these assumptions varies between healthy and diseased myocardium (13,14), suggesting these approaches may not accurately reflect the underlying biomechanical motion. Modality-related image quality could also complicate interpretation of abnormal strain values since these could reflect either real dysfunction or artifact-related inaccuracies, leading to some degree of subjectivity or non-conclusive results (3). Lastly, although automated segmentation and motion tracking commercial software is available for cardiac cine imaging, manual correction of delineated contours used for tracking is often required, resulting in significant variations in strain depending on segmentation procedure and type of commercial software (15).
Deep Learning (DL) methods have demonstrated the advantage of allowing real-world data guide learning of abstract representations that can be used to accomplish pre-specified tasks, and have been shown to be more robust to image artifacts than non-learning techniques for some applications (16,17). DL segmentation methods have been proposed (18)(19)(20)(21) and implemented within strain computational pipelines (22,23), and recent studies have shown that cardiac motion estimation can also be recast as a learnable problem (24)(25)(26)(27)(28). These methods usually consist of an intensity-based loss function and a constrain term (24,27), the latter using common machine learning techniques [e.g., L2 regularization of all learnable parameters (25)] or direct regularization of the motion estimates [e.g., smoothness penalty (24), anatomy-aware (28)]. However, none of these methods have considered the accuracy of myocardial strain as a design factor or have been applied to strain analysis.
We have recently developed a learning-based method for cardiac motion estimation that produces more accurate estimates than various techniques, including B-spline, diffeomorphic, and mass-preserving algorithms (29), and showed these estimates could potentially be used to detect regional dysfunction. Thus, incorporating our method within a strain analysis framework could potentially enable accurate, user-independent, and quantitative characterization of cardiac mechanics at a both global and regional level. While this framework could be based on echocardiography images (30), these data remain limited for strain mapping tasks by their low reproducibility of acquisition planes (4) and temporal stability of tracking patterns (31). In contrast, cine-MRI offers the most accurate and reproducible assessment of cardiac anatomy and function, thus providing a more thorough set of data for learning-based motion models.
We propose DeepStrain, a fast, automated workflow that derives global and regional strain measures from cine-MRI data by decoupling motion estimation and segmentation tasks. With decoupling, segmentations are not used for motion estimation during inference but rather to derive clinical parameters and to identify a cardiac coordinate system for strain analysis, further reducing the variability in strain directly related to segmentation. Although two-dimensional (2D) convolutional neural networks (CNN) for cardiac motion estimation from cine-MRI have been proposed (24,26,28,32), DeepStrain is the first end-to-end learning based workflow for myocardial strain analysis from cine-MRI. In addition, motion predicted using 2D architectures could be influenced by out-of-plane motion during the cardiac cycle, resulting in overestimation of in-plane motion and reduced reproducibility (33). Instead, this paper describes a carefully designed strain quantification-specific 3D CNN that handles challenges associated with the anisotropic resolution of cine-MRI data. Our loss weighting strategy to find the optimal balance between motion regularization terms also differs from previous methods which have traditionally relied on registration techniques as indirect measures of motion accuracy (24,26,28,32). Instead, we simulated cine-MRI data with corresponding ground-truth cardiac motion to identify the hyperparameters yielding accurate motion and strain estimates. The optimal trained configuration is online at https://github.com/moralesq/ DeepStrain. Finally, this paper also provides a comprehensive assessment of the accuracy and repeatability of DeepStrain measures, a task that has been mostly ignored in the deep learning literature but is critical to clinical adoption (4).

Myocardial Strain Definitions
Strain represents percent change in myocardial length per unit length. The 3D analog for MRI is given by the Green-Lagrange strain tensor where u (t) denotes myocardial displacement from a fully-relaxed end-diastolic (ED) phase at t = 0, to a contracted frame at t >0. Radial and circumferential strain are the diagonal components of the tensor E evaluated in cylindrical coordinates. Strain rate (SR) is the time derivative of (1). The time of acquisition of each frame was extracted from the DICOM and was used to interpolate E(t), such that E(t) was defined at every millisecond. The time derivative was then evaluated using central differences and reported as change in strain per second with unit s −1 .
Global strain is defined as the average of E over the whole LV myocardium (LVM) volume. Regional strain is defined as the average of E over the volume of specific LVM segments defined by the American Heart Association (AHA) polar map (34), which requires labels of the right ventricle to construct. Specific parameters based on timing and magnitude are extracted from the measures evaluated over a whole cardiac cycle: end-systolic strain (ESS), defined as the global strain value at end-systole (ES); systolic strain rate (SRs), defined as the peak (i.e., maximum) absolute value of global SR during systole; early-diastolic strain rate (SRe), defined as the peak absolute value of global SR during diastole. Although only radial and circumferential strain were analyzed in this study, DeepStrain is also capable of generating shear (Supplementary Section 1). The code used to construct the AHA polar maps is available in the repo online.

Centering, Segmentation, and Motion Estimation
DeepStrain (Figure 1) consists of a series of CNNs that perform three tasks: a ventricular centering network (VCN) for automated centering and cropping, a cardiac segmentation network (CarSON) to generate tissue labels, and a cardiac motion estimation network (CarMEN) to generate u. Estimates of u are used to calculate myocardial strain, and segmentations are used to derive volumetric parameters, identify a cardiac coordinate system for strain analysis, and generate tissue labels used for anatomical regularization of motion estimates at training time.
All networks have a common encoder-decoder architecture consisting primarily of convolution, batch normalization (35), and PReLU (36) layers with residual connections (37). The specific architecture formulation and losses are discussed below and Supplementary Section 2.

VCN
Let V t be a cine-MRI frame at time t defined over a n-D domain ⊂ R n , and let v ∈ . VCN uses a single-channel array V with size 256 × 256 × 16 to generate a single-channel array G pred of equal size, where G pred corresponds to a Gaussian distribution with mean defined as the LVM center of mass. This approach models the uncertainty associated with the center location, specially in pathological cases, and enables automated generation of ground-truth labels when manual segmentation of uncropped images is available. VCN was trained using the mean square error (MSE) loss function where G gt is the ground-truth Gaussian distribution. At inference, the input volume V is centered and cropped around the voxel with the highest value in G pred to generate a new cropped array of size 128 × 128 × 16, which is then the input to CarSON and CarMEN.

CarSON
CarSON is a 2D architecture that uses single-channel images V of size 128 × 128 to generate a 4-channel segmentation M pred of equal size, each channel corresponding to a label. We experimented with two different loss functions L seg to train CarSON using the manual segmentations M ms : the pixelwise categorical cross-entropy (CCE), and a multi-class Dice coefficient (MDC) loss function where k ∈ [0, 3] represents each of the tissue labels (i.e., background, RV, LVM, and LV), and v k ∈ M denotes all the pixels with label k.

CarMEN
CarMEN estimates the motion u t of the heart from V 0 to V t , i.e., for each voxel v ∈ , u t (v) is an approximation of the myocardial displacement during contraction such that V 0 (v) and (u t • V t )(v) correspond to similar cardiac regions. The operator • refers to application of a spatial transform to V t using u t via trilinear interpolation (38). Thus, CarMEN uses a 2-channel input volume consisting of two concatenated arrays with size 128 × 128 × 16 to generate a 3-channel array u of equal size, each channel representing the x, y, and z components of motion.
Although the current formulation of CarMEN shares some similarities with our previous work, we have made several design modifications that were specific for accurate strain quantification. Here a combination of three loss functions was used for training: first, we used an unsupervised loss function L intensity that trains CarMEN using the input volumes and generated motion estimates Second, we used a supervised function L anatomical that leverages segmentations of the input volumes at training time to impose an anatomical constrain on the estimates Third, smooth estimates were encouraged by using a diffusion regularizer where dr is the spatial resolution of V. Thus, the loss function for CarMEN is a linear combination of (4), (5), and (6), weighted by λ i , λ a , λ s , accordingly. Some design variations were exclusive to estimation of motion from 3D cine-MRI frames. Convolution, pooling, and upscaling was implemented with 3 × 3 × k z operations, where k z could be set to either 1 or 3. For k z = 1, operations were carried out only in the x-y-plane to account for the low and varying zresolution, different from 3D architectures for segmentation with 3 × 3 × 3 convolutions and in-plane-only pooling and upscaling (39). Thus, context in the z-dimension is aggregated through trilinear interpolation of V t and M t volumes in (4) and (5), and through application of 3D spatial gradients to u in (6). The spatial gradient in (6) also includes an additional term dr to account for differences between in-plane and slice resolution which was not used in (40). Lastly, we experimented with CCE and MDC implementations as anatomical constrains in (5).
At inference, the entire cycle of a single subject can be analyzed using sequential inputs

EXPERIMENTS Datasets
For development we used the Automated Cardiac Diagnosis Challenge (ACDC) dataset (41), consisting of cine-MRI data from 150 subjects evenly divided into five groups: healthy and patients with hypertrophic cardiomyopathy (HCM), abnormal right ventricle (ARV), myocardial infarction with reduced ejection fraction (MI), and dilated cardiomyopathy (DCM). These data were publicly available as train (n = 100) and test (n = 50) sets, with manual segmentations included for the train set only. For validation of motion and strain measures we used the Cardiac Motion Analysis Challenge (CMAC) dataset (42), consisting of paired tagging-and cine-MRI data from 15 healthy subjects. To assess intra-scanner repeatability, 10 healthy volunteers were recruited to undergo repeated scans on a 3T MRI scanner (Supplementary Section 3). All cine-MRI frames and corresponding segmentations were resampled to a

Design of a Strain Quantification-Specific CNN
Reported normal ranges of strain in healthy individuals using non-learning methods vary largely between the different deformation methodologies, limiting the clinical utility of strain measures (4). We used this concept as a heuristic in updating CarMEN, i.e., a useful design should minimize the variation in strain values in healthy individuals. To assess the impact of design choices on this heuristic, we separated the ACDC training set into two group-balanced train and test subsets, each with 50 subjects. We trained CarMEN for 300 epochs using two different layer operation sizes (i.e., 3 × 3 × k z with k z ∈ {1, 3}), and two different implementations of (5) (i.e., MDC and CCE). With k z = 3, comparison of losses showed that CCE leads to increased standard deviation in radial ESS in healthy train (n = 10) and test (n = 10) subjects, and large differences in the average radial ESS between training and testing sets (Figure 2). Multiple experiments with different regularization parameters showed similar results, and showed that setting k z = 1 reduces deviations in healthy strain (Supplementary Table 1). Thus, the new CarMEN design used 3 × 3 × 1 operations and was regularized using the MDC function.

Novel Loss Weighting Strategy for Accurate Motion and Strain Estimation
Most proposed networks to-date have used registration terms such as (4) and (5) to indirectly assess the accuracy of u t on validation or test datasets. However, this approach is prone to errors since inaccurate and even unrealistic u t solutions can minimize these terms. To find an optimal balance between loss terms, we simulated 10 cardiac cine-MRI frames at ED and ES with known ground-truth motion using the MR-extended cardiac-torso (MRXCAT) (43,44), a software phantom used extensively in imaging studies (45). The motion of the software phantom was modeled using gated patient 4D tagging data, producing highly realistic contracting and twisting motion of the normal heart that can be parameterized to generate populationwide characteristics, as previously described by us (29). We trained CarMEN with various regularization parameters for 300 epochs using 100 subjects from the ACDC training set, and tested the models on the MRXCAT data by evaluating the end-point error between ground-truth and predicted motion estimates within the LVM (Figure 3). Setting λ s = 0 leads to highly irregular motion vectors (e.g., off by more than 90 degrees) relative to ground-truth. Setting the smoothness and anatomical weights to λ s = λ a = 0.1 leads to smoother and better aligned vectors, albeit with a slightly decreased magnitude. Increasing the anatomical weight to λ a = 0.5 further improves the estimates by generating vectors with similar magnitude and orientation to the ground-truth. Quantitative measures of motion accuracy showed similar results across various regularization values, and these changes in motion estimation accuracy were reflected as bias changes in strain values (Figure 4). We found the optimal parameters to be λ i = 0.01, λ a = 0.5, λ s = 0.1, which in addition resulted in low strain deviation in healthy subjects as described in the previous section (Supplementary Table 1). Thus, the optimal architecture and hyperparameters were selected based on both the ACDC (i.e., to assess strain deviation in healthy subjects) and XCAT (i.e., to assess motion and strain accuracy).

Final Model Training
Ground-truth distributions for VCN were created using the manual segmentations. VCN and CarSON were trained using the FIGURE 3 | Qualitative effects of smoothing and anatomical regularization on the accuracy of motion estimates on the MRXCAT dataset. First row shows the predicted (black) motion estimates when the anatomical regularization is set to 0.5 and smoothing is set to 0. Relative to the ground-truth (red), these estimates are highly irregular. Increasing (third column) the smoothness to 0.1 and setting anatomical to 0.1 improves the direction of the estimates, but the magnitude is reduced. This is corrected by increasing anatomical regularization to 0.5 (fourth column).
ED and ES frames of the train set, as only these included groundtruth segmentations. This provided 200 training samples for VCN and 3200 for CarSON, the latter having more samples since it is a 2D architecture and all frames were resampled to a volume with 16 slices. VCN was tested by five-fold cross-validation, whereas the accuracy of CarSON was assessed by submitting the results to the challenge website. Once CarSON was trained, we generated segmentations of the test set to train CarMEN using the entire ACDC dataset, i.e., 100 subjects from the train set with manual segmentations and 50 from the test set with CarSON-predicted segmentations. Only the ED-ED and ED-ES pairs were used for training. The former pair is useful for the network to learn the identity transformation. Data augmentation included random rotations and translations, random mirroring along the x and y axes, and gamma contrast correction. All data augmentation was performed only in the x-y plane.

Segmentation and Motion Estimation
The CarSON-predicted and manual segmentations were compared using the Hausdorff distance (HD) and Dice Similarity Coefficient (DSC) metrics at both ED and ES. Accuracy of LV volumetric measures derived from segmentations, including ED volume (EDV), EF, and LVM, was assessed using the correlation, bias, and standard deviation metrics. The mean absolute error (MAE) for the LV EDV and LVM were also computed for comparison against the intra-and inter-observer variability reported by (41). RV labels were not analyzed since they were not used to assess cardiac function but rather to define the direction of the septal wall, which is needed to construct the LV strain polar maps with a normalize orientation between subjects. We compared our results to top-3 ranked methods published for the ACDC test set as these appear in the leader-board of the challenge (18,20,21,39).
CMAC organizers defined 12 landmarks at intersections of gridded lines on tagging images at ED, one landmark p 0 per wall (septal, inferior, lateral, interior) per ventricular level (basal, mid, septal). These landmarks were manually-tracked on tagging images by two observers over the cardiac cycle, and each position was transformed from tagging to cine coordinates using DICOM header information. We used the CarMEN motion estimates u t to automatically deform the landmarks at ED, and the accuracy was assessed using the in-plane end-point error (EPE) between deformed p t ′ = u t • p 0 and manually-tracked p t landmarks, defined by Due to temporal misalignment between the tagging and cine acquisitions, EPE was evaluated only at ES (t = t ES ). Specifically, let p ij (t) denote the manually-tracked landmarks of subject i at frame t by observer j. The accuracy of CarMEN was assessed using the average EPE Our results were compared to those reported by the four groups that responded to the challenge (42), MEVIS (46), IUCL (9), UPF (11), and INRIA (12,47). All groups submitted tagging-based motion estimates, but only UPF and INRIA provided estimates based on cine-MRI.

Strain Validation and Intra-Scanner Repeatability
The tagging-MRI method with the lowest AEPE at ES was used as the reference for strain analysis. The tagging-MRI-based motion estimates were registered and resampled to the cine-MRI space.
Global strain and SR values throughout the entire cardiac cycle were derived from the resampled estimates as described in (48). Global-and regional-based analyses were performed to assess the repeatability of measures from two acquisitions. Relative changes (RC) and absolute relative changes (aRC) were calculated, taking the first acquisition as the reference. ESS and SR were calculated for the global-based analysis, and for region-based analyses, ESS values were normalized using the AHA polar map, and both RC and aRC were evaluated for each of the segments in the polar map.

Statistics
For validation, Bland-Altman analysis was used to quantify agreement between predicted and tagging strain measures. We used the term bias to denote the mean difference and the term precision to denote the standard deviation of the differences, the latter computed with 1-degree of freedom. Differences were also assessed using a paired t-test with Bonferroni correction for multiple comparisons. For global-and regional-based analyses of strain intra-scanner repeatability, ICC estimates and their 95% confidence intervals (CI) were calculated based on a singlerating, absolute agreement, 2-way mixed-effects model. Analyses were performed on Python v3.4 with the statistical pingouin module (49).

Segmentation and Motion Estimation
Centering, segmentation, and motion estimation for an entire cardiac cycle (∼25 frames) was accomplished in <13 s on a 12GB GPU and <2.2 min on a 32 GB RAM CPU. VCN located the LV center of mass with a median error of 1.3 mm. Training with a MDC loss function resulted in slightly more accurate segmentations compared to CCE (Supplementary Table 2), therefore the MDC-trained model was used for all remaining analyses. With this model, correlation of CarSON and manual LV volumetric measures was >0.98 across all measures ( Table 1), and biases in EF (+0.25 ± 3.2%), ED (+0.76 ± 6.7 mL), and ES (+0.19 ± 5.8 mL) volumes, and mass (+1.4 ± 10.3 g) were not significant. Further, these biases were smaller than those obtained with other methods, which were positive for LV EDV (1.5-3.7 mL), negative for LVM (−2.1 to −2.9 g), and close to zero (±0.5%) for EF. Simantiris et al. (18) obtained the best precision for LV EF (2.7 vs. 3.2% variance with CarSON), EDV (4.6 vs. 6.7 mm), and LVM (6.5 vs. 10.3 g). Isensee et al. (39) obtained the best results on geometric metrics, i.e., lower HD for the LV (ED 5.5 vs. 5.7 mm; ES 6.9 vs. 7.7 mm) and LVM (7.0 vs. 8.1 mm; 7.3 vs. 9.2 mm), and higher DSC for the LVM (0.904 vs. 0.898; 0.923 vs. 0.913). The DSC for the LV was similar for all methods (∼0.967, ∼0.929). MAE for the LV EDV and LVM were 5.3 ± 4.1 mL and 6.8 ± 6.5 g. Figure 5A illustrates a representative example of the tagging and cine images from a CMAC subject. Landmarks defined at ED were deformed to ES using the CarMEN estimates and compared to manual tracking. Banding artifacts on cine images showed no clear effect on derived motion estimates or landmark deformation, as shown in ES (Figure 5A, yellow arrow) or throughout the whole cardiac cycle (see Supplementary Video 1). The manual tracking inter-observer variability was 0.86 mm (Figure 5B, dotted line). Within cinebased techniques, CarMEN (2.89 ± 1.52 mm) and UPF (2.94 ± 1.64 mm) had lower (p < 0.001) AEPE relative to INRIA (3.78 ± 2.08 mm), but there was no significant difference between CarMEN and UPF. All tagging-based methods had lower AEPE compared to cine approaches, particularly MEVIS (1.58 ± 1.45 mm). Finally, we evaluated the AEPE of the motion vectors in 10 synthetic datasets to compare our results against our previous CarMEN implementation. The AEPE was 1.6 ± 0.1 mm (1.1 ± 0.4 pixels) at ED, 2.1 ± 0.1 mm (1.33 ± 0.03 pixels) at ES, and 1.8 ± 0.2 mm (1.20 ± 0.2 pixels) combined. for SRe, accordingly. These values were similar to tagging-based ones, although circumferential SRe from cine-MRI data was lower, mostly in the train set (0.7 ± 0.2 s −1 ).

Strain Analysis
Comparison of tagging-and cine-based strain measures with matched subjects showed an overall agreement in timing and magnitude of strain and SR throughout the cardiac cycle,

Tagging-based measures are shown for the CMAC cohort. DeepStrain repeatability is shown for two acquisitions (ACQ). MEVIS was used to calculate tagging measures. Data are presented as mean (standard deviation), and as mean [95% confidence interval] for all three datasets combined.
although there were visual differences in peak SR parameters ( Figure 5C). Visual inspection of image artifacts on cine data showed no evidence that these artifacts affected strain values derived with DeepStrain (Supplementary Figure 1).  Figure 2). However, there were larger differences (p < 0.01) in radial SRs ( Figure 6A. The overall bias in circumferential and radial ESS were 0.17 and −0.16%, accordantly. Average RC between parameters was less than ±1% for ESS and less than ±5% for peak SR ( Table 3). Average aRC was ∼5% for ESS (circumferential: 3.0 ± 2.0%; radial: 5.1 ± 5.8%), ∼8% for SRs (8.0 ± 6.8%; 7.7 ± 4.0%), and ∼10% for SRe (10.2 ± 7.8%; 9.2 ± 8.6%). Mean ICC values showed repeatability was good to excellent for ESS (0.75; 0.90), SRs (0.77, 0.91), and SRe (0.83, 0.84). The limits-of-agreement (LoA), which defines the interval where to find the expected differences in 95% of the cases assuming normally distributed data, were ∼2 and ∼6% for circumferential and radial ESS, and ∼0.5 s −1 for SR measures. Average RC and aRC across regional segments were within ±2% for circumferential and ±5% for radial ESS, except in anterior segments (±8%) radially ( Figure 6B). Regional mean ICC values showed good to excellent repeatability across all segments, except circumferentially near inferoseptal, inferior, and inferolateral walls were repeatability was moderate (Supplementary Table 3).
LoAs showed that 95% of differences occurred within ∼5 and ∼10% intervals for circumferential and radial ESS.

Evaluation in Patients With Cardiovascular Disease
Regional measures of ESS averaged over patient population (Supplementary Figure 3), as well as global values of strain and SR across the cardiac cycle (Figure 7) for all 100 subjects in the ACDC train set showed progressive decline in strain values starting with HCM, followed by ARV, MI, and DCM. Specifically, relative to the healthy group, radial ESS was reduced in all patient populations. Radial systolic and early-diastolic SR were also reduced in all patient groups, except for systolic SR in HCM. Figure 8 shows both the cine-MRI image and the circumferential ESS polar map of a healthy subject and two patients with MI. Strain values in the healthy polar map have a homogeneous distribution. In contrast, in one MI patient the map indicates a diffused reduction, and inspection of the myocardium on the cine-MRI image shows an anteroseptal infarct that coincides in location with segments with more prominent decreases in strain. In a different MI patient with an infarct located in a similar septal region, strain changes are focal and localized to the anteroseptal wall.

DISCUSSION
In this study we developed a fast DL framework for strain analysis based on cine-MRI data that does not make assumptions about the underlying physiology, and we benchmarked its segmentation, motion, and strain estimation components against the state-of-the-art. We compared our segmentations to other DL methods, motion estimates to other non-learning techniques, and strain measures to a reference tagging-MRI technique. We also presented the intra-scanner repeatability of DeepStrainbased global and regional strain measures, and showed that these measures were robust to image artifacts in some cases. Global and regional applications were also presented to demonstrate the potential clinical utilization of our approach. Our work is the first to report within a single study the characterization, validation, and repeatability of a learning-based method for strain analysis.

Volumetric Measures
Segmentation from MRI data is a task particularly well-suited for CNNs given the excellent soft-tissue contrast, thus all top performing methods on the ACDC test set were based on DL approaches. Isensee et al. (39) had remarkable success on geometric metrics, but this and other approaches result in a systematic overestimation of the LV EDV and thus underestimation of LVM. In contrast, CarSON generated less biased measures of LV volumes and mass, which were not significant. Although Simantiris and Tziritas (18) obtained the most precise measures, possibly due to their extensive use of augmentation using image intensity transformations, across methods the precision of EF was within the ∼3-5% (50) needed when it is used as an index of LV function in clinical trials (51).
Lastly, we showed that the error in our measures of LV EDV and LVM was almost half the inter-observer (∼10.6 mL, 12.0 g), and comparable to the intra-observer (∼4.6 mL, 6.2 g) MAE reported in (41), but further investigations are required to assess the performance on more heterogeneous populations. Lastly, CarSON tends to perform better on DSC metrics compared to HD. This is mainly due to inclusion or exclusion of myocardium labels in most basal slides as described by Bernard et al. (41). However, the smoothing penalty used to train CarMEN reduces the impact on strain estimates by promoting smooth motion values across the myocardial tissue.

Strain Validation
The application of myocardial strain to quantify abnormal deformation in disease requires accurate definition of normal ranges. However, previously reported normal ranges vary largely between modalities and techniques, particularly for radial ESS (4). In this study we showed DeepStrain generated strain measures with narrow CI in healthy subjects from across three different datasets. Although direct comparison with the literature is difficult due to differences in the datasets, overall our strain measures agreed with several reported results. Specifically, circumferential strain is in agreement with studies in healthy participants based on tagging (−16.6%, n = 129) and speckle tracking echocardiography (−18%, n = 265) datasets (52,53), as well a recently proposed (−16.7% basal, n = 386) tagging-based DL method (48). Our radial strain values are in agreement with some tagging-based studies (26.5%, n = 129; 23.8% basal, n = 386) (48,52), but are lower than most reported values (4). This is a result of smoothing regularization used during training to prevent overfitting. However, lowering the regularization without increasing the size of the training set would lead to increased EPE and wider CI. SR measures derived with DeepStrain were also in good agreement with previous tagging-based studies (52). The CMAC dataset enabled us to compare our results to nonlearning methods using a common dataset. We found that AEPE at ES was lower with tagging-based techniques, reflecting the advantage of estimating cardiac motion from a grid of intrinsic tissue markers (i.e., grid tagging lines). Further, the tagging techniques also benefited from the fact that landmarks were placed near the center of the myocardial wall borders, whereas motion estimation from tagging data at the myocardial walls and in thin-walled regions of the LV is less accurate due to the spatial resolution of the tagging grid (4). In addition, some of the tagging-MRI images did not enclose the whole myocardium and some contained imaging artifacts, which resulted in strain artifacts toward the end of the cardiac cycle. Nevertheless, MEVIS-based motion estimates achieved the lowest AEPE at ES and thus represent a reliable reference for end-systolic strain measures. This performance could be a result of their image term (4) that penalizes phase shifts in the Fourier domain instead of intensity values, an approach that is less affected by desaturation. The UPF approach also achieved a low AEPE using multimodal integration and 4D tracking to leverage the strengths of both modalities and improve temporal consistency (11). Specific differences in motion and strain measures between MEVIS and other techniques were thoroughly discussed by Tobon-Gomez et al. (42).
Using MEVIS as the tagging reference standard, we found no significant differences in measures of circumferential of radial and ESS. Validation studies have shown similar [±1%, (54-56)] or worse [±11% for radial, (55)] biases between cine feature tracking and tagging strain. However, these methods required manual contouring by an expert, whereas our method is fullyautomatic. We found significant differences in SR measures between the two techniques that could be due to drift errors in the MEVIS implementation, i.e., errors that accumulate in sequential implementations in which motion is estimated frameby-frame (42).
The AEPE on the synthetic dataset of 1.20 pixels was lower than our previously reported 1.7 pixels, which is expected as our previous implementation was not anatomically constrained. Although we did not observe considerable improvements in AEPE compared to tagging-and cine-based methods, an important advantage of our learning-based approach is the reduced computational complexity (∼13 s in GPU) relative to the proposed MEVIS (1-2 h), IUCL (3-6 h), UPF (6 h), and INRIA (5 h) approaches (42). Specifically, because once trained our network does not optimize for a specific test subject (i.e., it does not iterate on the cine-data to generate the desired output), centering, segmentation, and motion estimation for the entire cardiac cycle can be accomplished much faster (<2 min in CPU). In addition, DeepStrain was trained on a relatively small dataset and was evaluated on data from different institutions and vendors, therefore its accuracy relative to non-learning methods could substantially improve through training with larger cohorts or application of data shift correction strategies. Furthermore, a joint optimization of segmentation and motion estimation CNNs could potentially improve the robustness of the workflow to undersampled data (24).

Strain Repeatability
In this study we also evaluated the intra-scanner repeatability of strain measures in 10 healthy subjects, an important aspect to consider when assessing the potential clinical utility of DeepStrain. Confidence intervals in circumferential and radial ESS were 0 ± 1% and 0 ± 3%, better than the intra-observer variability reported using feature tracking in 10 healthy adults (57). A more recent study in 100 healthy individuals reported intra-and inter-observer repeatability for circumferential (ICC intra: 0.88, ICC inter: 0.88) and radial ESS (0.82, 0.79), which were comparable to our results for circumferential ESS (0.75) and radial ESS (0.90) using only 10 subjects. Finally, our repeatability of SR measures was good to excellent, similar to that reported for healthy (n = 20) and patient (n = 60) populations (58). Thus, without requiring expert operators, DeepStrain achieved better or equal repeatability compared to feature tracking methods.

Potential Clinical Applications
DeepStrain could be applied in a wide range of clinical applications, e.g., automated extraction of imaging phenotypes from large-scale databases (59). Such phenotypes include global and regional strain, which are important measures in the setting of existing dysfunction with preserved EF (3). DeepStrain generated measures of global strain and SR over the entire cardiac cycle from a cohort of 100 subjects in <2 min. These results showed that radial SRe was reduced in patients with HCM and ARV, despite having a normal or increased LV EF. Decreased SRe with normal EF is suggestive of subclinical LV diastolic dysfunction, which is in agreement with previous findings (60,61). Our results also showed DeepStrain-based maps could be used to characterize regional differences between groups.
At an individual level, we showed that in MI patients, polar segments with decreased circumferential strain matched myocardial regions with infarcted tissue. Further, we showed that the changes in regional strain due to MI can be both diffuse and focal. These abnormalities could be used to discriminate dysfunctional from functional myocardium (62), or as inputs for downstream classification algorithms (63). More generally, DeepStrain could be used to extract interpretable features (e.g., strain and SR) for DL diagnostic algorithms (64), which would make understanding of the pathophysiological basis of classification more attainable (65).

Study Limitations
A limitation of our study was the absence of important patient information (e.g., age), which would be needed for a more complete interpretation of our strain analysis results, for example to assess the differences in strain values found between the healthy subjects from the ACDC and CMAC datasets. Nevertheless, using publicly available data enables the scientific community to more easily reproduce our findings, and compare our results to other techniques. Another limitation was the absence of longitudinal analyses, i.e., longitudinal strain was not reported because it is normally derived from long-axis cine-MRI data not available in the training dataset. The size of the datasets is another potential limitation. The number of patients used for training is much smaller than the number of trainable parameters, potentially resulting in some degree of overfitting. To correct this, the training set for motion estimation could be expanded by validating the proposed segmentation network on more heterogeneous populations. The use of strain minimization deviation as a training heuristic also serves as a learning constrain but has not been validated, and could potentially prevent identification of subtle disease due to loss of sensitivity to abnormal strain. While our repeatability results were promising despite testing in only a small number of subjects, repeatability in patient populations was not shown. Further, reproducibility across sites and vendors was not assessed. In addition, the accuracy of the motion estimates on patient populations with regional dysfunction was not assessed, and we did not quantify the effect of dataset shift errors that might occur when applying our method to new datasets.

Conclusion
We developed an end-to-end learning-based workflow for strain analysis that is fast, operator-independent, and leverages real-world data instead of making explicit assumptions about myocardial tissue properties or geometry. This approach enabled us to derive strain measures from new data that were repeatable, and comparable to those derive from dedicated tagging data. These technical and practical attributes position DeepStrain as an excellent candidate for use in routine clinical studies or datadriven research.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found at: https://github.com/ moralesq.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by written consent was obtaining from all volunteers with approval of the institutional review board (2018P002912) and in agreement with the Health Insurance Portability and Accountability Act (HIPAA) at the Massachusetts General Hospital. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
MM designed the workflow, performed data analysis, and drafted the manuscript. All other authors revised the drafted manuscript and contributed critical intellectual content. This manuscript has been revised and approved by all authors.

ACKNOWLEDGMENTS
We acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research.