Comparison of Complex k-Space Data and Magnitude-Only for Training of Deep Learning–Based Artifact Suppression for Real-Time Cine MRI

Propose: The purpose of this study was to compare the performance of deep learning networks trained with complex-valued and magnitude images in suppressing the aliasing artifact for highly accelerated real-time cine MRI. Methods: Two 3D U-net models (Complex-Valued-Net and Magnitude-Net) were implemented to suppress aliasing artifacts in real-time cine images. ECG-segmented cine images (n = 503) generated from both complex k-space data and magnitude-only DICOM were used to synthetize radial real-time cine MRI. Complex-Valued-Net and Magnitude-Net were trained with fully sampled and synthetized radial real-time cine pairs generated from highly undersampled (12-fold) complex k-space and DICOM images, respectively. Real-time cine was prospectively acquired in 29 patients with 12-fold accelerated free-breathing tiny golden-angle radial sequence and reconstructed with both Complex-Valued-Net and Magnitude-Net. Cardiac function, left-ventricular (LV) structure, and subjective image quality [1(non-diagnostic)-5(excellent)] were calculated from Complex-Valued-Net– and Magnitude-Net–reconstructed real-time cine datasets and compared to those of ECG-segmented cine (reference). Results: Free-breathing real-time cine reconstructed by both networks had high correlation (all R2 > 0.7) and good agreement (all p > 0.05) with standard clinical ECG-segmented cine with respect to LV function and structural parameters. Real-time cine reconstructed by Complex-Valued-Net had superior image quality compared to images from Magnitude-Net in terms of myocardial edge sharpness (Complex-Valued-Net = 3.5 ± 0.5; Magnitude-Net = 2.6 ± 0.5), temporal fidelity (Complex-Valued-Net = 3.1 ± 0.4; Magnitude-Net = 2.1 ± 0.4), and artifact suppression (Complex-Valued-Net = 3.1 ± 0.5; Magnitude-Net = 2.0 ± 0.0), which were all inferior to those of ECG-segmented cine (4.1 ± 1.4, 3.9 ± 1.0, and 4.0 ± 1.1). Conclusion: Compared to Magnitude-Net, Complex-Valued-Net produced improved subjective image quality for reconstructed real-time cine images and did not show any difference in quantitative measures of LV function and structure.


INTRODUCTION
Cardiovascular MR (CMR) is the clinical gold-standard imaging modality for evaluation of cardiac function and structure. Breathhold ECG-segmented cine imaging using balanced steady-state free-procession readout (bSSFP) allows for accurate and reproducible measurement of left-ventricular (LV) and rightventricular (RV) function and volume [1][2][3]. In this technique, k-space is divided into different segments collected over consecutive cardiac cycles within a single breath-hold scan. However, ECG-segmented cine acquisition has limited spatial and temporal resolution, is sensitive to changes in heart rate, and requires repeated breath-holds [4][5][6]. Alternatively, freebreathing real-time cine has been proposed and pursued using rapid real-time imaging or multiple averaging with or without motion correction [7][8][9][10][11][12]. Using free-breathing real-time cine is advantageous because it does not require multiple breath-holds and is insensitive to heart rate variations. However, real-time cine has lower temporal and spatial resolution than ECG-segmented cine [10,11]. Therefore, there is a need to further accelerate data collection for real-time cine MRI.
Over the past three decades, there has been considerable progress in the development of accelerated real-time cine imaging including parallel imaging and compressed sensing [13][14][15][16][17][18]. Parallel imaging is almost always used in cine imaging for both real-time and ECG-segmented acquisition with robust and highly reliable image quality [13]. However, the acceleration rate of parallel imaging cannot be more than three without compromising image quality [19][20][21]. Compressed sensing has recently been integrated into applications by vendors enabling higher acceleration rates than parallel imaging; however, reconstruction time is long, and acceleration rates beyond four can result in degradation of image quality [17]. Alternative techniques that exploit spatial-temporal correlation and sparsity of cine data have also been explored [22][23][24][25][26]; however, these approaches can suffer from temporal data filtering, often removing information that is crucial to cardiac cine evaluation. Therefore, despite considerable interest from the image reconstruction community, these techniques are rarely clinically used.
Deep learning-based reconstruction has been recently proposed to enable rapid reconstruction of accelerated cine MRI. Hauptmann et al. [27] showed that a 3D U-net was capable of reconstructing accelerated (acceleration rate 13) real-time cine MRI. Schlemper et al. [28] showed that a trained cascade network was able to rapidly reconstruct accelerated (acceleration rate 11) cine MRI. Kustner et al. [29] showed that (3 + 1)-dimensional complex-valued spatio-temporal convolutions and multi-coil data processing (CINENet) could reconstruct accelerated (9 ≤ acceleration rate ≤15) 3D ECGsegmented cine. El-Rewaidy et al. [30] reconstructed accelerated radial cine MRI (acceleration rate 14) using a complex-valued network (MD-CNN) designed to process MR data in both k-space and image space. Daming et al. [31] used a complex U-net with a combined mean-squared error and perceptual loss (PCNN) to reconstruct real-time cine MRI (acceleration rate 15).

While
promising, popular deep learning-based reconstructions methods [27][28][29][30][31][32] for cine MRI rely on supervised learning and, as such, require training with large and diverse patient datasets. However, prospectively acquiring large patient datasets within a clinical setting can be difficult due to long scanning times, respiratory/cardiac motion, or contrast washout. To overcome these limitations, Hauptmann et al. proposed training a deep learning network using synthetic data generated from DICOMs (Digital Imaging and Communications in Medicine) [27]. The use of DICOM imaging is advantageous because it is readily available in large numbers at centers with cardiac MR expertise. While promising, DICOM usage during training is theoretically non-optimal given that DICOM images are magnitude images, which lack phase and multi-coil information; furthermore, vendors often apply different filtering techniques to improve image quality in the DICOM creation process. The effect of using DICOM images for training on the performance of a deep learning model has not yet been rigorously studied.
In this study, we sought to investigate differences in performance between two deep learning-based models trained to suppress artifacts in 12-fold accelerated real-time cine. Paired complex-valued k-space data and DICOM images of ECGsegmented cine (n 503) were used to synthetize highly undersampled radial real-time cine data. Both artifact suppression models were made using 3D U-net architectures. One model was trained with synthetic radial real-time cine images generated from complex k-space data (Complex-Valued-Net), while the other model was trained with synthetic radial real-time cine images generated from DICOM images (Magnitude-Net). The performance of the two models was evaluated against prospectively collected free-breathing realtime cine CMR with radial acquisition. Figure 1 summarizes our study which was designed to compare the performance of deep learning-based networks trained to suppress aliasing artifacts in highly accelerated real-time cine using complex-valued images (derived from k-space data) and magnitude-only images (derived from DICOM images). We prepared a dataset containing both complex k-space data and corresponding magnitude images (i.e., DICOM) scanned by breath-holding ECG-segmented cine using a Cartesian trajectory to synthesize radial real-time cine data ( Figure 1A) [27]. Two 3D U-net models [33], Complex-Valued-Net and Magnitude-Net, were developed to remove aliasing artifacts in complex-valued and magnitude images of highly accelerated radial real-time cine, respectively. Complex-Valued-Net and Magnitude-Net were trained using synthetized radial real-time cine with aliasing artifacts generated from complex-valued k-space and magnitude-only images, respectively. "Artifactfree" images used to produce synthetized radial cine were used as the ground truth ( Figure 1B). Finally, the performance of both networks was compared using prospectively acquired free-breathing highly accelerated (12x) radial real-time cine in 29 patients. Quantitative functional and structural parameters of the LV and qualitative visual assessments of the LV were compared against reference values derived from ECG-segmented cine images ( Figure 1C).

Training Datasets
We retrospectively collected short-axis (SAX) cine data from 503 patients (286 males, 55.4 ± 15.8 years) who underwent clinical scans at BIDMC from October 2018 to May 2020. Imaging was performed on a 3T MR scanner (MAGNETOM Vida Siemens Healthineers, Erlangen, Germany) using a breath-hold ECG-segmented sequence with the following parameters: bSSFP readout, FOV 355 × 370 mm 2 , inplane resolution 1.7 × 1.4 mm 2 , slice thickness 8 mm, TE/TR 1.41/3.12 ms, flip angle 42°, GRAPPA acceleration rate 2-3, ∼18 cardiac phases at a temporal resolution of ∼55.3 ms, receiver bandwidth 1,502 Hz/pixel, Cartesian sampling pattern, and slices per volume 11 ± 1 (from 9 to 17). Cine's paired raw k-space data and DICOM images were used in this study. This study protocol was approved by the institutional review board, and written consent was waived. Patient information was handled in compliance with the Health Insurance Portability and Accountability Act.

Synthesizing Real-Time Cine Training Data
Supplementary Figure S1 shows the data preparation workflow for producing synthetic accelerated radial real-time cine datasets from ECG-segmented cine data acquired using the Cartesian trajectory. The complex-valued multi-coli k-space data with an acceleration rate of 2-3 were first reconstructed by GRAPPA [21] offline. Offline GRAPPA reconstruction was implemented with the code made available by Dr. Chiew (https://users.fmrib.ox.ac. uk/∼mchiew/Teaching.html).
Then, GRAPPA-reconstructed images and the original DICOM images exported from the scanner were interpolated to achieve 2 × 2 mm 2 in-plane resolution with a temporal resolution of 37.7 ms. We chose these interpolated spatial and temporal resolutions to match the temporal and spatial resolutions used during prospective real-time cine scanning (see below). These GRAPPA-reconstructed or DICOM images were also used as the ground truth in training of two neural networks, respectively. Subsequently, backward non-uniform fast Fourier transform (NUFFT) [34] was applied to GRAPPAreconstructed and DICOM images to produce complex-valued radial k-space. Twelve lines per frame, which were distributed over the whole k-space with a tiny golden-angle rotation (32.049°) [35,36], were chosen to simulate highly accelerated radial k-space of real-time cine.
For both Complex-Valued-Net and Magnitude-Net, simulated highly accelerated radial k-space data were transformed into image space using forward NUFFT. Specifically, for complexvalued multi-coil k-space, the above procedures were performed on a coil-by-coil basis. Finally, a coil-combined image was generated using sensitivity-encoding coil combination [37]. An auto-calibrated sensitivity profile for each coil was produced as previously described [38]. Note that a GPU-based implementation of NUFFT (https://cai2r.net/resources/gpunufft-an-open-source-gpulibrary-for-3d-gridding-with-direct-matlab-interface/) was used for synthetic MRI generation. Cine images of 503 patients with both raw k-space data and DICOMs were collected. These images were scanned using a breath-holding cine sequence with a Cartesian trajectory. (B) Raw k-space data and DICOMs of ECG-segmented cine were used to synthesize highly accelerated radial real-time cine datasets for training Complex-Valued-Net and Magnitude-Net, respectively. (C) Performance comparison between the two neural networks. Real-time radial cine and corresponding ECG-segmented cine images were collected from 29 patients. The left-ventricular function, structural parameters, and subjective image scores were used to compare the performance of both deep learning models with respect to aliasing artifact suppression. For quantitative and qualitative evaluation, Magnitude-Net reconstruction, Complex-Valued-Net reconstruction, and ECG-segmented cine were compared to one another in pairs.

Deep Learning Models and Training
Supplementary Figure S2 presents an in-depth description of the 3D residual U-net architecture used for Complex-Valued-Net and Magnitude-Net. The U-net architecture of both networks comprised five million kernels and two max-pooling layers/upconvolutional layers. Each convolutional processing layer consisted of 3 × 3 × 3 kernels, batch normalization, and rectified linear activation function (ReLU) [33].
The input/output of each network consisted of paired artifactfree ground truth images and their corresponding undersampled, artifact-contaminated images (size: M × N × T 144 × 144 × 20). Specifically, for Complex-Valued-Net, we concatenated real and imaginary components of complex-valued input/output pairs to enable real-valued deep learning model processing of complexvalued data (size: 2M × N × T 288 × 144 × 20) [39]. For Magnitude-Net, a ReLU operator was positioned at the final layer to force the output to be non-negative [27]. L 2 loss function was used to train both networks.
Both networks were implemented using PyTorch (Facebook, Menlo Park, California) and trained on a DGX-1 workstation (NVIDIA Santa Clara, California, United States) equipped with 88 Intel Xeon central processing units (2.20 GHz), eight NVIDIA Tesla V100 graphics processing units (GPUs), and 504 GB RAM. Each GPU has 32 GB memory and 5120 Tensor cores. Each network was trained with 2,900 iterations using an ADAM optimizer and with a 15% drop-out rate. Each iteration randomly chose cine images of 16 LV slices from different patients (batch size). For synthetized real-time cine with ≥20 frames, the starting frame was randomly selected to achieve 20 consecutive frames. For <20 timeframes, the dynamic series was circularly padded to 20. Both input and output images were normalized by the 95th percentile magnitude pixel intensity within the central region (i.e., 48 × 48) across 20 frames. The initial learning rate was 0.001, which decreased by 5% after every 100 iterations. The cost function and optimizer were selected to match parameters proposed by Hauptmann et al. [27] for neural network training using DICOM-derived simulated realtime cine.

Real-Time Cine Performance Evaluation
Twenty-nine patients (16 males, 58 ± 16 years) were prospectively recruited. Free-breathing radial real-time cine research sequences in addition to clinically indicated CMR sequences were collected from each patient. Written informed consent was obtained from each patient prior to CMR imaging. Clinical indications and characteristics of these patients are listed in Supplementary  Table S1. Breath-hold ECG-segmented cine was performed using the same imaging parameters as those detailed in Training Datasets. Free-breathing radial real-time cine was collected with the following parameters: bSSFP readout, FOV 288 × 288 mm 2 , resolution 2 × 2 mm 2 , slice thickness 8 mm, TE/TR 1.3/3.2 ms, flip angle 43°, receiver bandwidth 1,085 Hz/pixel, radial lines per phase 12, and temporal resolution 37.7 ms. The rotating angle of the radial line was 32.049° [36]. Both sequences imaged a stack of 14 SAX slices covering the entire LV. Breath-holding ECG-segmented cine was reconstructed by the scanner. For free-breathing real-time cine, NUFFT first transformed radial k-space data into complex-valued and magnitude images. Subsequently, two neural networks were used to remove aliasing artifacts.

Data Analysis
We used both quantitative imaging parameters and qualitative assessments of image quality to compare the performance of both deep learning reconstructions. ECG-segmented cine images collected using the standard clinical protocol were used as a reference. For each patient in our independent validation dataset, one reader (HH), trained by a clinical reader (SK) with 5 years of experience, calculated the following cardiac function and structural parameters: LV ejection fraction (LVEF), LV end-diastolic volume (LVEDV), LV end-systolic volume (LVESV), LV stroke volume (LVSV), and LV mass (LVMass). All quantifications were performed using CVI42 (v5.9.3, Cardiovascular Imaging, Calgary, Canada). Linear regression and Bland-Altman analysis were performed to evaluate correlation and agreement between real-time cine and ECG-segmented cine. A paired Student's t-test was conducted to compare the difference between two approaches in measures of LV function and structural parameters. p < 0.05 was considered statistically significant. Three pairwise group comparisons were assessed using the t-test with Bonferroni correction, with p less than 0.0167 considered significant.
Subjective image quality was graded by one reader (SK) with 5 years of CMR experience. Cine images of all patients obtained from the three methods were randomized and deidentified. For each method, whole LV cine images from each subject were scored with respect to conspicuity of endocardial borders (1: non-diagnostic, 2: poor, 3: adequate, 4: good, 5: excellent), temporal fidelity of wall motion (1: non-diagnostic, 2: poor, 3: adequate, 4: good, 5: excellent), and artifact level on the myocardium (1: non-diagnostic, 2: severe, 3: moderate, 4: mild, 5: minimal). Supplementary Figure S3 shows representative graded images. The z-test was used to compare image quality between every two methods, and a p-value < 0.05 was considered significant. SAS version 9.4 (SAS Institute, Cary, NC, United States) was utilized for all above analyses. Note that we elected not to quantitatively and qualitatively analyze real-time cine reconstructed with gridding because gridding alone did not produce diagnostic image quality. Figures 2A,B show images obtained from the basal, mid, and apical cavities of one subject at end-systole and -diastole by ECGsegmented cine and free-breathing real-time cine via gridding, Complex-Valued-Net, and Magnitude-Net reconstruction. Supplementary Videos S1-S4 show the corresponding movies for dynamic display. We also show representative end-systolic images for three patients in Supplementary Figure S4. In both Figure 2 and Supplementary Figure S4 Table S2 summarizes LV structure and cardiac function values from ECG-segmented cine and freebreathing real-time cine in 29 patients. The mean difference and 95% CI between every two methods are listed in Supplementary  Table S3. According to Bland-Altman analysis ( Figures 3A-C), mean differences between ECG-segmented cine and real-time cine by Complex-Valued-Net reconstruction were −0.9 ± 6.5% (p 0.48) for LVEF, 0.9 ± 13.6 ml (p 0.73) for LVEDV, and 2.2 ± 12.5 ml (p 0.34) for LVESV. Correspondingly, mean differences between real-time cine by Magnitude-Net and ECGsegmented cine images were −2.  Figure 4 shows the mean/standard deviation and distribution of image quality scores across all patients. Supplementary Table S4 lists the percentages as two grades (1-3 and 4-5) of image quality scores across all patients by each method. The corresponding differences in the percentage of two grade groups (1-3 and 4-5) among three methods are listed in Table 1. The table shows that 79% of ECG-segment cine images had good or excellent scores (>3) for myocardial edge (4.1 ± 1.4) and temporal fidelity (3.9 ± 1.0). In contrast, 50% of real-time cine images reconstructed by both Complex-Valued-Net and Magnitude-Net scored less than or equal to 3 (myocardial edge: 3.5 ± 0.5 vs 2.6 ± 0.5; temporal fidelity: 3.1 ± 0.4 vs 2.1 ± 0.4), suggesting poor image quality. ECG-segment cine had less artifact (4.0 ± 1.1) than real-time cine (Complex-Valued-Net: 3.1 ± 0.5; Magnitude-Net: 2.0 ± 0.0). All z-tests were found to be significant (p < 0.05).

DISCUSSION
This study compares the performance of deep learning approaches for reconstruction of highly accelerated real-time cine using synthetized training data generated from complexvalued multi-coil k-space data (Complex-Valued-Net) and realvalued DICOMs (Magnitude-Net). Our subjective assessment of image quality demonstrates that Complex-Valued-Net yields better image quality than Magnitude-Net. However, the clinically relevant parameters of LV function and structure extracted from real-time cine reconstructed by both Complex-Valued-Net and Magnitude-Net were highly correlated and had excellent agreement with those of clinical breath-holding ECGsegmented cine.
There is a growing body of literature in deep learning, beyond CMR, in which magnitude images are used for training a variety of deep learning techniques [27,[40][41][42]. However, there is also concern regarding the impact that discarded phase information may have on the clinical interpretation of reconstructed images [27,29,[43][44][45]. Our study demonstrates that availability of complex k-space data improves overall image quality; however, these improvements in image quality do not necessary impact FIGURE 2 | Images at end-systolic (A) and end-diastolic (B) phases for three short-axis slices (base, mid, apex) in one patient. Magnitude-Net exhibits more image artifact (red arrow) and greater blurring (yellow arrow) at the myocardial wall than Complex-Valued-Net. Gridding reconstruction produces non-diagnostic image quality.
Frontiers in Physics | www.frontiersin.org September 2021 | Volume 9 | Article 684184 clinical interpretation and quantification. This observation is not unique, and it is often debated whether "prettier" images necessarily lead to better diagnostic information. While the resulting data do not show clinically meaningful differences in LV function and structural parameters, an improvement in overall image quality may still be clinically relevant. For example, we often rely on wall motion abnormality to assess the presence of ischemia, which can be visually assessed by reviewing cine images [46]. One can envision that improved image quality may still be clinically relevant and provide additional confidence in image assessments. Further studies in patients with different imaging indications are warranted.
In cine imaging, voxel-values are not meaningful; however, in quantitative CMR imaging (e.g., T 1 /T 2 mapping, quantitative perfusion, or phase-contrast), voxelvalues represent a tissue-specific meaning [47]. While qualitative imaging such as cine imaging is more forgiving in terms of artifact and inaccuracy during image reconstruction, quantitative CMR imaging is very sensitive to image artifacts. In addition, complex k-space data carry crucial information in quantitative imaging and cannot simply be discarded. Therefore, complex k-space data will still be needed for quantitative CMR image reconstruction with deep learning, despite our findings showing that For this study, our goal was not necessarily to study or develop a new architecture but was instead motivated by Hauptmann et al. and their important contribution of using readily available DICOMs for network training [27]. Raw complex k-space data will still be needed for deep learning models that integrate complex k-space data for image reconstruction. However, limited availability of complex k-space data will remain a major challenge for training such networks on different applications, diseases, scanner vendors, field strengths, and number of coils. On the contrary, if one can train the model using only DICOM images, there are vast amounts of available data for different organs, sequences, diseases, and vendors that could greatly impact the adoption of deep learning artifact reconstruction techniques.
This study has several limitations. Our training data were not collected using prospectively acquired datasets using radial k-space filling, but instead training data were synthesized in a similar manner as proposed by Hauptmann et al. [27]. We used ECG-gated cine images with Cartesian sampling to extract reference values for different LV functional and structural parameters for comparison with real-time radial imaging [27]. There may be differences between the two approaches  due to the k-space sampling scheme. Additionally, ECGsegmented data were collected with breath-holding, while realtime data were collected during free-breathing. The evaluation of deep learning reconstruction methodologies was limited to image quality assessment and quantification of left-ventricular functional and structural parameters (i.e., EF, LVEDV, LVESV, LVSV, and LVMass). We chose these metrics because of their clinical importance. That said, further studies are warranted to evaluate the capacity of the presented methods (Magnitude-Net and Complex-Valued-Net) for diagnosis of cardiovascular diseases. Real-time cine reconstructed with gridding was not quantitatively or qualitatively analyzed because gridding alone produced nondiagnostic image quality. Subjective image assessment was performed by a single reader, and there may be differences in image perception by different reviewers. Both Magnitude-Net and Complex-Valued-Net suffer from reduced temporal fidelity compared to ECG-gated segmented cine. Such a loss of temporal fidelity can be especially problematic during systolic phases and may be a source of error during qualitative and quantitative evaluation. All patients in our testing cohort were in sinus rhythm. Only a single neural network architecture (i.e., 3D U-net) was used to compare the performance of magnitude vs complex-valued synthetic training data. We chose this network architecture because, to the best of our knowledge, it is the only architecture shown to be capable of reconstructing radial real-time cine MRI acquired with bSSFP readout [27,31]. Other state-ofthe-art approaches such as cascade networks [28,29] have yet to be investigated for radial real-time cine reconstruction. Future collaborations are warranted to first extend other state-of-theart methods to radial real-time cine reconstruction and then compare the performance of different synthetic training data (i.e., magnitude vs. complex-valued) using these methods. ECG-segmented cine images used for training were gathered from one cardiac MR center. As such, trained networks could contain bias which can prevent generalization. Although we used a relatively large number of patients for training, our testing cohort with real-time radial imaging was relatively small, and images were acquired at a single clinical center. Future studies with more patients and imaging from different centers are required to evaluate proposed deep learning methodologies for real-time cine reconstruction.

CONCLUSION
Despite improved subjective image quality in real-time cine images reconstructed using a deep learning model trained with complex k-space data when compared to magnitude-only data, there were no differences with respect to quantitative measures of LV function and structural parameters.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, and further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
This study was approved by the BIDMC Institutional Review Board (IRB) and was Health Insurance Portability and Accountability Act (HIPPA)-compliant. This study was performed under two IRB approved protocols, including one allowing use of retrospective data collected as part of a clinical exam for machine learning research; informed consent was waived for use of previously collected data. In addition, we prospectively recruited subjects for this study, and writteninformed consent was obtained from all prospective participants.

AUTHOR CONTRIBUTIONS
HH-V and RN contributed to study design and validation of deep learning models. HH-V contributed to training of deep learning models. RG, SK, YT, and LN performed data analysis, and RG and RN prepared the manuscript. JR, AP, PP, and BG contributed to data collection. RN contributed to data interpretation. All authors critically revised the paper and read and approved the final manuscript.