# CURRENT AND FUTURE ROLE OF ARTIFICIAL INTELLIGENCE IN CARDIAC IMAGING

EDITED BY : Steffen Erhard Petersen, Karim Lekadir, Alistair A. Young and Tim Leiner PUBLISHED IN : Frontiers in Cardiovascular Medicine

#### Frontiers eBook Copyright Statement

The copyright in the text of individual articles in this eBook is the property of their respective authors or their respective institutions or funders. The copyright in graphics and images within each article may be subject to copyright of other parties. In both cases this is subject to a license granted to Frontiers. The compilation of articles constituting this eBook is the property of Frontiers.

Each article within this eBook, and the eBook itself, are published under the most recent version of the Creative Commons CC-BY licence. The version current at the date of publication of this eBook is CC-BY 4.0. If the CC-BY licence is updated, the licence granted by Frontiers is automatically updated to the new version.

When exercising any right under the CC-BY licence, Frontiers must be attributed as the original publisher of the article or eBook, as applicable.

Authors have the responsibility of ensuring that any graphics or other materials which are the property of others may be included in the CC-BY licence, but this should be checked before relying on the CC-BY licence to reproduce those materials. Any copyright notices relating to those materials must be complied with.

Copyright and source acknowledgement notices may not be removed and must be displayed in any copy, derivative work or partial copy which includes the elements in question.

All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. For further information please read Frontiers' Conditions for Website Use and Copyright Statement, and the applicable CC-BY licence.

ISSN 1664-8714 ISBN 978-2-88966-058-2 DOI 10.3389/978-2-88966-058-2

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# CURRENT AND FUTURE ROLE OF ARTIFICIAL INTELLIGENCE IN CARDIAC IMAGING

Topic Editors:

Steffen Erhard Petersen, Queen Mary University of London, United Kingdom Karim Lekadir, University of Barcelona, Spain Alistair A. Young, King's College London, United Kingdom Tim Leiner, University Medical Center Utrecht, Netherlands

Citation: Petersen, S. E., Lekadir, K., Young, A. A., Leiner, T., eds. (2020). Current and Future Role of Artificial Intelligence in Cardiac Imaging. Lausanne: Frontiers Media SA. doi: 10.3389/978-2-88966-058-2

# Table of Contents

*04 Editorial: Current and Future Role of Artificial Intelligence in Cardiac Imaging*

Karim Lekadir, Tim Leiner, Alistair A. Young and Steffen E. Petersen

*11 Machine Learning for Assessment of Coronary Artery Disease in Cardiac CT: A Survey*

Nils Hampe, Jelmer M. Wolterink, Sanne G. M. van Velzen, Tim Leiner and Ivana Išgum

*19 Machine Learning Approaches for Myocardial Motion and Deformation Analysis*

Nicolas Duchateau, Andrew P. King and Mathieu De Craene


Aurélien Bustin, Niccolo Fuin, René M. Botnar and Claudia Prieto


Matthew E. Fenech and Olly Buston


Kathleen Gilbert, Charlène Mauger, Alistair A. Young and Avan Suinesiaputra

# Editorial: Current and Future Role of Artificial Intelligence in Cardiac Imaging

Karim Lekadir <sup>1</sup> \*, Tim Leiner <sup>2</sup> , Alistair A. Young<sup>3</sup> and Steffen E. Petersen4,5

<sup>1</sup> Universitat de Barcelona, Artificial Intelligence in Medicine Lab (BCN-AIM), Departament de Matemàtiques and Informàtica, Barcelona, Spain, <sup>2</sup> Department of Radiology, Utrecht University Medical Centre, Utrecht, Netherlands, <sup>3</sup> School of Biomedical Engineering & Imaging Sciences, King's College London, London, United Kingdom, <sup>4</sup> Barts Heart Centre, Barts Health NHS Trust, London, United Kingdom, <sup>5</sup> NIHR Barts Biomedical Research Centre, William Harvey Research Institute, Queen Mary University of London, London, United Kingdom

Keywords: artificial intelligence, cardiac imaging modalities, big data, cardiac image analysis, cardiovascular personalized medicine, AI adoption and translation

**Editorial on the Research Topic**

**Current and Future Role of Artificial Intelligence in Cardiac Imaging**

### INTRODUCTION

Cardiovascular disease is currently the most common cause of morbidity and mortality worldwide (1) and thus remains an important focus for both biomedical and technological research. In the age of personalized medicine, cardiac imaging is expected to play an important role to enable more accurate and advanced quantification of structural and functional changes due to cardiovascular disorders. However, despite advances in cardiac imaging modalities, such as echocardiography, cardiovascular magnetic resonance, cardiac computed tomography, and nuclear cardiology, the heart remains a challenging anatomical organ to image and assess compared to other organ systems. The main challenges faced by cardiac imaging include the perpetual cardiac and respiratory motions, the complex geometry of the ventricles, atria and arteries, the oblique orientation of the heart with respect to the body, and the small size of some of the cardiac structures, such as the coronary arteries, trabeculae and papillary muscles, as well as the large variability in imaging conditions and protocols (including non-contrasted and contrast-enhanced cardiac imaging sequences).

Consequently, advanced tools are needed to optimize the use of cardiac imaging and to support clinicians throughout the whole value-chain of cardiovascular practice, including improved image acquisition, automated cardiac quantification, cardiac tissue characterization, imaging biomarker discovery, and clinical decision support. In this context, artificial intelligence (AI), including machine learning and computer vision, has emerged as one of the most promising topics over the last 5 years. Combined with the exponential increase in computing power, AI provides unprecedented opportunities to leverage the available collections of cardiac imaging data for developing more robust cardiac image analysis algorithms, to uncover currently unknown clinical knowledge on cardiac health and disease, and to build novel software tools that will impact clinical cardiology. This area is expected to benefit from the current efforts to provide access to largescale and high-quality image data for the scientific community. In the US, for example, existing studies such as the Multi-Ethnic Study of Atherosclerosis and the Framingham Heart Study have for a long time compiled thousands of cardiac images. More recently, in the United Kingdom, the UK Biobank has been acquiring cardiac MRI images from tens of thousands of individuals (2).

### Edited and reviewed by:

Hendrik Tevaearai Stahel, Bern University Hospital, Switzerland

> \*Correspondence: Karim Lekadir karim.lekadir@ub.edu

#### Specialty section:

This article was submitted to Cardiovascular Imaging, a section of the journal Frontiers in Cardiovascular Medicine

> Received: 20 May 2020 Accepted: 30 June 2020 Published: 07 August 2020

#### Citation:

Lekadir K, Leiner T, Young AA and Petersen SE (2020) Editorial: Current and Future Role of Artificial Intelligence in Cardiac Imaging. Front. Cardiovasc. Med. 7:137. doi: 10.3389/fcvm.2020.00137 In Europe, the euCanSHare project funded by the European Commission is developing a data sharing and analytics platform to facilitate access to large-scale cardiac imaging and nonimaging data from multiple centers (www.eucanshare.eu). Similar initiatives and large cohorts are expected to emerge across the globe in the years to come, which will further enhance the potential of AI in cardiac imaging and clinical cardiology.

To promote and guide further research and developments in AI for cardiac imaging, several experts and leading institutions in the field have recently published position and review papers that outline the initial achievements, discuss the current challenges and identify future perspectives. For example, Dey et al. summarized the most promising AI methods for cardiac imaging by distinguishing between the use of classical AI and advanced approaches (3). Al'Aref et al. reviewed some clinical applications of AI in cardiac imaging by considering each cardiac imaging modality separately (4). Litjens et al. focused on the sole application of deep learning methods for cardiac image analysis (5). Finally, Petersen et al. outlined the current challenges and emerging opportunities (6), emphasizing the importance of addressing non-technical aspects of AI in cardiac imaging such as patient acceptance, data protection and AI regulation.

While these papers provided an overall presentation and promotion of the field, the goal of this special issue entitled "current and future role of AI in cardiac imaging" is to compile more detailed and focused reviews covering the whole valuechain of AI in cardiac imaging. Specifically, we have invited active experts across the globe to submit in-depth reviews on several keys areas of AI in cardiac imaging, including (1) cardiac image reconstruction, (2) cardiac image segmentation, (3) cardiac shape and motion analysis, (4) computer-aided diagnosis, (5) imaginggenetics integration, and (6) socio-ethical impact and regulations. This comprehensive special issue, totaling nearly 1,000 references from the field, will constitute an unprecedented resource for researchers, both novice and experienced, to study in detail the methods, applications, strategies, datasets, tools, hypotheses, limitations and opportunities that are of direct relevance to each aspect of AI in cardiac imaging. Importantly, for each paper, we requested the authors to provide descriptions and discussions for both AI and clinical audiences, to enhance the democratization and promotion of AI in cardiac imaging, and thus future collaborations and developments in the field.

### SPECIAL ISSUE CONTENT

This special issue cover six specific areas and application domains of AI in cardiac imaging, as shown in **Figure 1** and presented as follows:

### Cardiac Image Reconstruction

The first paper of this special issue focuses on the very first step of the cardiac imaging workflow, i.e., enhancing cardiac image acquisition using AI. Fast and portable cardiac imaging such as echocardiography inherently suffers from low image quality, while high resolution cardiac imaging such as CMR requires long acquisition times to address the cardiac and respiratory motions, as well as the need to highlight the different types of cardiac structures, tissues and vessels. For a long time, enhancing and accelerating image acquisition for modalities such as CMR was targeted by developing new CMR imaging/physics sequences and techniques, such as efficient pulse sequences, motion compensation techniques, multiple radio-frequency receiver coils for parallel imaging and compressed sensing. In this special issue, Bustin et al. thoroughly surveyed emerging AI techniques for accelerating and enhancing CMR image reconstruction. Concretely, AI provides a unique opportunity to perform CMR acquisition using undersampling strategies that acquire less image data than needed, followed by learning-based estimation of the sparse domain from existing data. The authors reviewed first the initial learning-based techniques based on dictionaries of transforms (from low to high resolution domains) learned from the acquired under-sampled data itself. Subsequently, they focused their attention on recent deep learning-based approaches, which learn the reconstruction transforms from low-resolution to high-resolution images offline based on training data. They surveyed in detail the many advances over the last 2 years, describing recent applications to both 2D dynamic cardiac imaging and 3D whole-heart CMR imaging. As deep learning based CMR reconstruction is novel as well as popular, they concluded their review by discussing the future avenues to improve real-world validation, including assessment of reconstruction quality and generalization.

### Cardiac Image Segmentation

In the next review papers, this special issue addressed the next stage of cardiac image analysis, namely cardiac image segmentation, which is by far the most covered topic by AI researchers in cardiac imaging. Deep learning approaches have been particularly popular as they have shown to generate highly accurate results when trained on large manually segmented data. At the same time, clinicians have been highly receptive to such automated black-box methods as they accelerate their clinical work without interfering in the decision making. In this special issue, Chen et al. put together a comprehensive review of deep learning techniques for cardiac image segmentation, totaling over 100 papers describing applications to various imaging modalities (echocardiography, MRI, and cardiac CT) and to the main cardiac structures (ventricle, atria, and vessels). They concluded that there is no universally optimal deep learning implementation for cardiac image segmentation, and suggested that algorithms need to be customized and optimized for each application depending on the imaging modality, protocol (e.g. contrast vs. non-contrast), and cardiac structure (left vs. right ventricle). For the immediate future, they discussed the importance of developing segmentation methods that can generalize well across various imaging modalities, scanners, and pathologies. Finally, in order to encourage reproducible research, the authors provided a summary of public datasets for training and testing new deep learning models, as well as public code repositories that include recently developed techniques.

In a more focused review, Jamart et al. surveyed one of the most challenging cardiac image segmentation applications, namely atrial segmentation from late enhanced cardiac MRI. This contrast enhanced imaging sequence is the technique of

choice in clinical practice to quantify fibrosis and assess atrial fibrillation. However, the task is complicated by the geometrical complexity and small size of the atrial chambers, which are also constrained by thin walls. Moreover, the anatomical boundaries on the late enhanced MRI often lack clear contrast, which can further mislead the segmentation algorithms. As a result, until the advent of deep learning, very few techniques had been attempted for automated segmentation of the atria. In this review, the authors summarized the recent deep learning developments in this field, including multi-stage and multi-scale conventional neural networks to address the image class imbalance that is inherent to atrial segmentation (the atrial cavity represents only a small fraction of the image volume), as well as to provide contextual cues to better differentiate the atrial boundaries from the surrounding structures. The best reported performance in the survey reached a 93.2% segmentation accuracy, which is highly promising given the complexity of the task. Arguably, the most important future work in this domain is the automated detection of the fibrosis using advanced AI approaches, which will represent an important Research Topic in the years to come.

### Cardiac Shape and Motion Analysis

Image segmentation of the cardiac boundaries as described above is a pre-requisite in clinical practice to estimate standard clinical indices such as chamber volumes and ejection fraction for cardiac assessment. However, existing research has shown that more detailed information about cardiac geometry and regional motion is expected to improve future clinical assessment of normal and abnormal cardiac (dys)function. In this special issue, Gilbert et al. presented a review of the so-called statistical atlases of cardiac anatomy, which are widely used to model complex shape and function variability across subpopulations, and thus to extract new descriptors of cardiac geometry of relevance to cardiac health and disease. In this paper, the authors firstly describe the existing and newer methods for building clinically useful statistical cardiac atlases from annotated cardiac contours. Subsequently, the paper discussed that, powered by supervised or unsupervised machine learning algorithms, statistical cardiac shape analysis can be used to automatically identify and quantify abnormal shape deviations, and to provide morphometric indices that are optimally associated with clinical factors.

In another review, Duchateau et al. described in great depth the literature on cardiac motion quantification and analysis based on machine learning. With these techniques, the idea is to learn new advanced representations and patterns of myocardial motion and deformation (displacement, velocity, deformation, torsion, strain) from representative samples of cardiac images, such as echocardiography or MRI. In particular, two families of approaches were reviewed, namely (1) traditional techniques that apply machine learning onto explicit features of myocardial motion/deformation (displacement fields calculated from the images), and (2) more recent approaches based on neural networks applied directly to the image data to extract and analyze new spatiotemporal signatures from local image patches around the myocardium. In this paper, the authors described the entire workflow of steps and methods required to derive physiologically meaningful and clinically useful analyses of cardiac motion. Finally, they discussed the next steps toward clinical adoption of machine learning based cardiac motion quantification, including community benchmarking, standardization initiatives, and clinical interpretability of the extracted spatiotemporal signatures for abnormality localization.

### Computer-Aided Diagnosis

Generally, the main AI developments in cardiac imaging have mostly addressed cardiac image analysis tasks before clinical decision making, with the aim to enhance cardiac image acquisition, facilitate cardiac image segmentation, and estimate advanced indices of cardiac shape and function. However, AI is also expected to impact clinical decision making in the future, such as to enable earlier and more precise diagnosis, as well as treatment planning and response estimation. To illustrate this, this special issue includes two review papers centered on cardiac diagnosis. First, Martin-Isla et al. surveyed in detail the area of image-based cardiac diagnosis using machine learning. This included a step-by-step description of the techniques for building and validating new AI models of cardiac diagnosis. The authors also described emerging techniques for more precise diagnosis, including radiomics (omics for radiology) (7). The authors then reviewed more than 100 papers on AI- and imagedriven diagnosis of coronary heart disease, cardiomyopathy, heart failure, and valve disease. Interestingly, the survey showed that some complex cardiac diseases are yet to be extensively investigated by the AI community, such as atrial fibrillation. Finally, the paper discussed current obstacles that limit the applicability of AI-driven diagnosis in cardiac imaging, in particular the lack of interpretability, which must be addressed in the years to come to enable clinicians to understand and trust the AI-generated diagnoses and decisions.

In a second review paper on AI for cardiac diagnosis, Hampe et al. focused on the assessment of coronary artery disease from non-invasive CT using machine learning. In clinical practice, catheter-guided X-ray angiography and intravascular ultrasound provide detailed information on coronary stenosis and plaque composition, but they are limited by their invasive nature. CT imaging provides a promising alternative but offers reduced contrast between the atherosclerotic plaque constituents, and thus AI is expected to play a role for the extraction of detailed and clinically useful information from the non-invasive images. This review surveyed classical and modern machine learning methods to estimate coronary stenosis, to discriminate between calcified, non-calcified and mixed plaques in CT, and to characterize fibrous and lipidic plaque constituents. Furthermore, the paper described recent methods, in particular deep learning based, for predicting fractional flow reserve directly from CT, by learning the relationship between CT features and fractional flow reserve based on a training sample of corresponding CT and invasive imaging. In their discussion, the authors noted an important limitation of current models, which are based on small training samples, as images with manually characterized plaques are more difficult to obtain than, for example, manual annotations of cardiac boundaries. Thus, this field of AI in cardiac imaging is expected to further develop for future clinical use as additional and larger datasets become available in the years to come.

### AI Integration With Non-imaging Data

To realize the promise of precision medicine in cardiology, cardiac imaging is a central piece of the puzzle. However, non-imaging data play an important role, in particular omics and health data, as they allow to build multi-scale AI models that integrate patient-specific biomolecular, phenotypic, environmental, and clinical information. Such integrated AI models are expected to lead to improved diagnosis and treatment selection, as well as to better clinical outcomes. This special issue included an interesting review by de Marvao et al. of AI-driven integrated cardiac imaging-genetics studies, which aim to characterize the complex interplay between cardiac imaging phenotypes, environmental and genetic factors. Several concrete examples of AI-empowered imaging-genetics studies were provided, such as to enable more stratified diagnosis of heart failure, predict treatment response in cardiomyopathic patients, identify genetic variants or proteomic signatures of high-risk atherosclerotic plaques, or predict positive cardiac remodeling after cardiac resynchronization therapy. While the use of AI in cardiovascular imaging-genetics is shown to have great potential, the review noted that the challenges of AI in genetics and imaging separately are amplified by combining these very large data. Thus, further research is expected to address more ambitious wholegenome and high-resolution whole-heart imaging studies, and to derive multi-scale AI solutions for clinical practice integrating imaging, biological and clinical data.

### Ethical, Social, and Political Issues

While the review papers described above dealt with technical and clinical aspects of AI in cardiac imaging, this special issue concludes with a paper by Fenech and Buston that reviews the ethical, social, and political issues that are being investigated to facilitate future acceptance and deployment of the AI solutions in cardiac imaging. These include clarifying the impact of AI solutions on the roles of cardiologists, radiologists, and other doctors, future liability of clinicians vs. AI manufacturers, as well as on the altered relationships between healthcare professionals, patients, their relatives, and administrators. Data sharing and privacy issues were also are reviewed in the paper, focusing on the challenges to manage patient informed consent for AI solutions that remain difficult to understand (and trust) by the general public and clinicians alike. From a social point of view, initial studies reviewed in this paper suggest that there is a concern that AI solutions may remain biased and in fact exacerbate health inequalities. Furthermore, the authors discussed the need to include patients and citizens in the AI development process, to take into close consideration their requirements, expectations, and behaviors. Other important issues, such as algorithmic transparency, fairness, and regulation, were also discussed at length. For addressing these key issues and to optimize adoption by clinicians, patients, and regulators, the paper emphasized the importance of developing principles and translating them into policies in the years to come. In the general context of AI, imaging and cardiology in particular are expected to play an important role, as exemplified by the fact that they have been the healthcare domains with the greatest number of FDA approvals for novel data-driven technologies in the recent years.

## FUTURE PERSPECTIVES

This special issue reviewed AI developments and opportunities for each task of the cardiac imaging clinical workflow, surveying in detail cardiac imaging acquisition, segmentation and quantification, clinical decision support and precision cardiovascular medicine through integration with genomics data, in addition to ethical, social, and regulatory aspects. A close look at the statistics from **Figures 2**, **3**, gathered from the publications reviewed in this special issue, shows a continuous increase in the research output in AI for cardiac imaging over the last 5 years. Interestingly, while cardiac image segmentation has received the most attention in the field as shown in **Figure 3**, due to the urgent need to accelerate the contouring process, it is also the only cardiac imaging task for which there is a decrease in the number of publications between 2018 and 2019 (**Figure 2**). This may be explained by the advances made possible by deep learning in the field, combined with an increasing interest from the community to invest in other AI applications such as cardiac image reconstruction or computer-aided diagnosis.

While this special issue described six main AI applications in cardiac imaging separately, we believe the ultimate aim should

FIGURE 2 | Number of papers reviewed in the six categories of cardiac imaging tasks in the last 5 years (2015 to 2019), showing an increase in the research output continuously and for all tasks, except for segmentation which decreased between 2018 and 2019 (over 550 papers reviewed in total). The subfigures are taken from the different papers in this special issue.

be to integrate these separate tasks into one single smooth, efficient and user-friendly pipeline for clinical cardiologists. As an example, this multi-task integration is the scope of the UK-based research project "SmartHeart: Next-generation cardiovascular healthcare via integrated image acquisition, reconstruction, analysis and learning" funded by UK's Engineering and Physical Sciences Research Council.

Additionally, while the special issue covered all important steps of cardiac imaging, it is worth mentioning other research tasks, at their very initial stages of development, which have not yet been covered in this special issue. For example, "automated quality control" is expected to enhance the cardiac image analysis workflow as large volumes of research and clinical data become available, and as the demand for AI-driven automation and robustness will increase in clinical practice. Here, it is worth listing a few preliminary works, such as automated quality control of CMR images using a deep learning approach to identify suboptimal image contrast or heart coverage (8). Other works have instead focused on quality control of the final image segmentation results using classical AI (9) or neural networks (10). Another area that may benefit from AI is "image-based computational cardiology," which builds patient-specific digital models of the heart to simulate treatment response. While this area has been traditionally addressed using pure physiological and mechanistic models, researchers are now investigating the integration of machine learning to improve the accuracy and speed of the personalized simulated outputs (11). Furthermore, as larger datasets become available, it is expected that predictive models of disease progression will be developed and validated, including by integrating imaging with non-imaging predictors (e.g., socio-demographic, biomarker, lifestyle and genomic data). Finally, while this special issue is dominated by AI applications in echocardiology, CMR and cardiac CT, there have also been machine learning applications in "nuclear cardiology" (12), and these are expected to increase in the years to come as larger nuclear medicine datasets (both PET and SPECT) become available.

To conclude this editorial, we wish to emphasize the need in the next years for more concerted efforts dedicated to enhancing the technical and clinical advances described in this special issue, but also to address non-technical and nonclinical aspects of AI in cardiac imaging. Importantly, there is a need for community-defined standards and guidelines for validating and adopting future AI solutions, including metrics and procedures to evaluate performance, bias and errors, clinical effectiveness, degree of interpretability, and even cost-effectiveness. Benchmarking datasets and tools are also required to enable transparent and comparative analysis of the AI solutions across research institutions and players. For example, a test-retest reference dataset was recently compiled to assess reproducibility of machine learning CMR studies (13), while an international challenge on multi-center and multi-vendor cardiac imaging segmentation was organized to test generalizability across scanners (Siemens, Philips, General Electric and Canon)<sup>1</sup> . Finally, ethical and regulatory aspects will need to be established in a multi-stakeholder collaboration between experts in AI, bioethics and cardiac imaging, but also with the involvement of patient associations, private companies, and public authorities.

<sup>1</sup>https://www.ub.edu/mnms/

### REFERENCES


### AUTHOR CONTRIBUTIONS

All co-authors discussed the structure and content of the special issue and paper. KL wrote the first draft. TL, AY, and SP revised the manuscript.

### FUNDING

The work of KL was partly funded by the European Union's Horizon 2020 research and innovation programme under grant agreement no. 825903 (euCanSHare project), and by the Ramon y Cajal Programme of the the Spanish Ministry of Economy and Competitiveness under grant no. RYC-2015-17183.


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Lekadir, Leiner, Young and Petersen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Machine Learning for Assessment of Coronary Artery Disease in Cardiac CT: A Survey

Nils Hampe1,2,3 \*, Jelmer M. Wolterink 1,2,3, Sanne G. M. van Velzen<sup>3</sup> , Tim Leiner <sup>4</sup> and Ivana Išgum1,2,3,5

<sup>1</sup> Department of Biomedical Engineering and Physics, Amsterdam University Medical Center, University of Amsterdam, Amsterdam, Netherlands, <sup>2</sup> Amsterdam Cardiovascular Sciences, Amsterdam University Medical Center, University of Amsterdam, Amsterdam, Netherlands, <sup>3</sup> Image Sciences Institute, University Medical Center Utrecht, Utrecht, Netherlands, <sup>4</sup> Department of Radiology, University Medical Center Utrecht, Utrecht, Netherlands, <sup>5</sup> Department of Radiology and Nuclear Medicine, Amsterdam University Medical Center, University of Amsterdam, Amsterdam, Netherlands

#### Edited by:

Fabrizio Ricci, G. d'Annunzio University of Chieti and Pescara, Italy

#### Reviewed by:

Maria A. Zuluaga, Institut Eurécom, France John Hoe, MediRad Associates Ltd, Singapore

\*Correspondence:

Nils Hampe n.hampe@amsterdamumc.nl

#### Specialty section:

This article was submitted to Cardiovascular Imaging, a section of the journal Frontiers in Cardiovascular Medicine

Received: 30 September 2019 Accepted: 12 November 2019 Published: 26 November 2019

#### Citation:

Hampe N, Wolterink JM, van Velzen SGM, Leiner T and Išgum I (2019) Machine Learning for Assessment of Coronary Artery Disease in Cardiac CT: A Survey. Front. Cardiovasc. Med. 6:172. doi: 10.3389/fcvm.2019.00172

Cardiac computed tomography (CT) allows rapid visualization of the heart and coronary arteries with high spatial resolution. However, analysis of cardiac CT scans for manifestation of coronary artery disease is time-consuming and challenging. Machine learning (ML) approaches have the potential to address these challenges with high accuracy and consistent performance. In this mini review, we present a survey of the literature on ML-based analysis of coronary artery disease in cardiac CT. We summarize ML methods for detection and characterization of atherosclerotic plaque as well as anatomically and functionally significant coronary artery stenosis.

Keywords: machine learning, coronary artery disease, atherosclerotic plaque, coronary artery stenosis, cardiac CT

### 1. INTRODUCTION

Diagnosis and monitoring of coronary artery disease (CAD) is increasingly based on non-invasive imaging with computed tomography (CT), allowing excellent visualization of the coronary arteries with high spatial resolution. Cardiac CT exams consist of hundreds of slices and the number of cardiac CT studies has been steadily increasing (1). This has led to an increased workload for medical professionals, which in combination with shortages of trained cardiac imagers (2) might lead to cardiac CT underuse in the clinic. Machine learning (ML) could offer a way to address these challenges and facilitate automatic cardiac CT analysis with consistent and accurate results. Furthermore, ML algorithms might enable an increased range of secondary diagnoses.

This survey provides an overview of ML algorithms for detection, characterization, and quantification of CAD in cardiac CT. We searched PubMed for articles related to ML-based assessment of CAD in cardiac CT published within the last 10 years (search strategy in **Supplementary Materials**) which led to inclusion of 59 studies. The structure of this survey is as follows. We provide a brief primer on ML in section 2. Applications of ML for automatic detection and characterization of atherosclerotic plaque are summarized in section 3. Studies focusing on ML for anatomical and functional evaluation of luminal stenosis are summarized in section 4. Finally, section 5 provides a discussion of outstanding challenges for transfer of ML algorithms into the clinic.

### 2. MACHINE LEARNING

Machine learning comes in many flavors, but most applications in cardiac CT use supervised learning. In supervised learning, a model is optimized to provide the correct labels as defined by the reference standard during training, and predict a label to new and unseen samples during testing.

Each sample can be described based on characteristics or features. Among the simplest ML algorithms are k-nearest neighbor (kNN) classifiers, which look for training samples with similar feature values to a test sample, and assign the test sample to the majority class among these training samples. Linear classifier (LC) models like support vector machines (SVM) aim to find a linear combination of features to separate samples in different classes. Alternatively, samples can be separated by thresholding feature values along a single axis. This is unlikely to lead to highly accurate classifiers, but by consecutively applying thresholds, a decision tree model can be built for more accurate classification.

ML performance can often be improved by combining predictions of multiple models. Ensembles (E) combine predictions of multiple simultaneously executed models, e.g., by averaging predictions of decision trees in a random forest (RF). In boosting (BO), models are applied consecutively and each model is trained to correct errors of its predecessors. Finally, artificial neural networks (ANNs) transform samples into targets through layers of trainable neurons, which are loosely based on biological neurons. While ANNs have been around since the 1950s, it has recently become possible to train networks that have many layers, i.e., deep learning. The success of deep learning in medical image analysis has been to a large extent due to the inclusion of trainable image filters in so-called convolutional neural networks (CNNs), which can be trained to extract valuable features from raw image data (3). For a more in-depth introduction to ML and deep learning, please refer to Jordan and Mitchell (4).

### 3. ATHEROSCLEROTIC PLAQUE DETECTION, CHARACTERIZATION, AND QUANTIFICATION

CT offers a non-invasive alternative to e.g., catheter-guided X-ray angiography, optical coherence tomography, and intravascular ultrasound (IVUS) for atherosclerotic plaque visualization. Characterization and quantification of plaque in CT provide insight in different stages of CAD (5). In this section, we survey analysis of methods for calcified plaque (section 3.1) and noncalcified and mixed plaque (section 3.2). Reviewed papers are listed in **Table 1**.

### 3.1. Calcified Plaque

Coronary artery calcification (CAC) quantification or scoring is typically performed in dedicated non-contrast-enhanced, ECGtriggered, calcium scoring CT images (CSCT). Using dedicated software, an expert identifies voxels with a density over 130 Hounsfield units (HU) in the coronary arteries. Identified CAC is then quantified according to its volume, density, or a TABLE 1 | Publications related to analysis of (A) calcified and (B) non-calcified and mixed atherosclerotic plaque.


(B) Non-calcified plaque Detect Characterize Training Testing Classifier


Check marks in (A) indicate detection (Detect) or characterization (Characterize) of plaque, check marks in (B) indicate analysis in dedicated non-contrast-enhanced calcium scoring CT (CSCT), chest CT (Chest CT) or coronary CT angiography (CCTA) images. The number of patients included for method development (Training) and evaluation (Testing) are listed, \* indicates cross-validation and - indicates training on non-patient data. The classifier with which the primary result was obtained is indicated (Classifier, see section 2 for abbreviations).

† A total of 15 scans was divided into training sets ranging from 1 to 13 and respective test sets comprised of the remaining scans.

combination of both (27). CAC cannot only be quantified in CSCT, but also in other kinds of CT images visualizing the heart, such as cardiac CT angiography (CCTA) and non-gated chest CT. Calcium scoring is not considered a difficult task for trained clinicians, but it is time-consuming when performed in large numbers of images. Hence, automatic ML-based methods have been proposed.

ML-based calcium scoring methods proposed prior to the advent of deep learning have focused on identification of CAC lesions among a large set of samples, i.e., groups of connected voxels above 130 HU. Samples are described with features such as size, shape, appearance and location to distinguish CAC from other candidate lesions such as calcifications in the aorta. Location features are of particular importance, as recognized by Liu et al. (15), Kurkure et al. (16), and Brunner et al. (17) who proposed a heart coordinate system. Similarly, Sánchez et al. (14) described candidate locations relative to anatomical landmarks. Išgum et al. (13) used multi-atlas registration to estimate the location of the coronary artery tree, while Shahzad et al. (12) and Wolterink et al. (11) estimated the location of three major coronary arteries for per-vessel calcium scoring. Yang et al. (9) extracted coronary artery centerlines in CCTA images and propagated these to CSCT images of the same patients to provide location features.

Deep learning-based methods have typically classified individual voxels instead of candidate lesions. Due to the extreme imbalance between numbers of CAC and background voxels in CT images, Wolterink et al. (10) proposed to use two CNNs, where one CNN identified candidate voxels in CCTA and the second CNN further discriminated among identified candidates. Similarly, Lessmann et al. (8) used two CNNs to identify calcified voxels in chest CT. Cano-Espinosa et al. (7) and de Vos et al. (6) avoid voxel-based classification altogether by directly regressing calcium scores in chest CT, enabling automatic scoring in less than a second.

Automatic CAC scoring methods have been validated in large data sets (28) and in other types of CT scans in which the heart is routinely visualized, such as attenuation correction images for PET-CT (29) and CT images acquired for radiotherapy treatment planning (30–32). Wolterink et al. presented a public data set with reference standard for standardized evaluation of CAC scoring in CSCT (33).

### 3.2. Non-calcified Plaque

Non-calcified plaque is typically lipid-rich and vulnerable to rupture, causing acute coronary syndrome (34). ML-based analysis methods in CCTA have been developed for detection or localization of non-calcified plaque, as well as characterization of lipid and fibrous plaque components.

Coronary artery localization by means of centerline extraction is a typical preprocessing step for ML-based plaque analysis. Traditionally, many automatic centerline extraction methods have been based on minimum cost paths between proximal and distal artery points (35, 36). ML has been used to verify automatic centerline extraction results with an RF (25) or CNN (37). Alternatively, centerlines can be iteratively extracted based on a single seed point. Wolterink et al. (38) showed how such a tracker can be guided by a 3D CNN that locally detects the artery orientation.

Coronary artery centerlines can be used to reconstruct CCTA volumes into images that allow better plaque visualization and identification. Zhao et al. (21), Jawaid et al. (22), Wei et al. (23), and Zuluaga et al. (26) used cross-sectional images along the coronary artery centerline to extract features describing the vessel wall shape and texture. In Jawaid et al. (22) and Wei et al. (23), these features were used in an SVM or linear classifier to determine whether the image contained non-calcified plaque. Similarly, Zuluaga et al. (26) used such features to train an SVM classifying lesion segments as either healthy or diseased, i.e., containing non-calcified or calcified plaque. Zhao et al. (21) trained an SVM to classify cross-sectional images as healthy or containing non-calcified, calcified, or mixed plaque. For the same task, Zreik et al. (20) trained a recurrent CNN that did not depend on hand-crafted feature extraction. Kelm et al. (25) used an RF classifier to classify whether noncalcified or calcified plaque was present along a coronary artery centerline segment.

Characterization of individual components in non-calcified plaque is a challenging task due to low-contrast boundaries between plaque components (39). Yamak et al. (24) exploited additional attenuation data provided by dual-energy CT to characterize plaque in manually determined regions of interest in axial slices. To validate their model in patient scans, manual CCTA annotations by an expert were used. However, obtaining reliable manual reference annotations for non-calcified plaque in CCTA is challenging. Kolossváry et al. (18) determined the reference standard in CCTA through registration of histology images to ex-vivo CCTA scans. Features were extracted for each cross-sectional image and lesions were classified into advanced or early stage atherosclerosis using a linear classifier. Alternatively, Masuda et al. (19) used an in-vivo IVUSbased reference standard to train a boosting classifier with histogram-based features distinguishing fibrous from lipid plaque in CCTA.

### 4. CORONARY STENOSIS DETECTION AND CHARACTERIZATION

Non-invasive assessment of CAD-induced stenotic lesions in CT prior to invasive treatment may prevent unnecessary costs and complications (40). Therefore, CT images have long been used to assess the anatomical significance of lesions by a local measurement of luminal narrowing. However, determination of the functional significance of a lesion by taking physiology into account can better stratify patients in need of treatment (41). In this section, we review ML algorithms for the detection and quantification of anatomically (section 4.1) and functionally (section 4.2) significant stenosis. Reviewed papers are listed in **Table 2**.

### 4.1. Anatomical Significance

Identification of anatomically significant stenotic lesions in CCTA, i.e., those lesions causing a luminal narrowing of at least 50%, allows a first assessment of the severity of stenosis in patients with symptoms of CAD. While this assessment is often based on visual estimation by a clinician, this is a difficult task (56) with substantial inter-observer variability (57). ML-based automatic approaches could reduce this variability.

Stenosis detection typically requires a local measurement of the lumen diameter and an estimation of the healthy lumen diameter. These estimates can be based on automatically extracted centerlines (section 3.2). Many centerline extraction methods also estimate the luminal radius at each centerline point, assuming a circular coronary artery profile (25, 38). However, circular artery profiles are not a realistic assumption for diseased vessel segments. Automatically extracted centerlines can also be used as an initialization for more detailed lumen segmentation. Huang et al. (44) used centerlines to obtain a reformatted image in which the lumen was segmented using a 3D CNN. Lee et al. (42) use centerlines to TABLE 2 | Publications related to (A) anatomically and (B) functionally significant stenosis detection.


Check marks indicate arterial (Artery) or myocardial (Myocardium) analysis. The number of patients included for method development (Training) and evaluation (Testing) are listed, \*indicates cross-validation and - indicates training on non-patient data. The classifier with which the primary result was obtained is indicated (Classifier, see section 2 for abbreviations).

obtain a tube-shaped prior that is deformed to segment the coronary lumen.

Lumen segmentation is often considered a preprocessing step for stenosis detection, but it has been shown that stenosis degree can also be directly determined based on image data. Zuluaga et al. (48) detected stenosis and artery bifurcations with an SVM based on features obtained from concentric circles in cross-sectional images. Similarly, Kang et al. (45) used geometrical and plaque features in an SVM to detect obstructive lesions (> 50% narrowing) and non-obstructive lesions (25–50% narrowing). Zreik et al. (20) used a recurrent CNN to detect anatomically significant stenosis along the centerline. Freiman et al. (43) detected stenosis of at least intermediate severity (> 40% narrowing) using deep sparse autoencoders, a variation on CNNs.

Coronary stenoses are located in the arteries, but may restrict blood flow to myocardial segments. Mukhopadhyay et al. (47) used an ML approach to identify myocardial segments (58) affected by coronary stenosis. Hand-crafted feature vectors describing the endocardial surface shape were combined using a bag-of-words approach and classified with an ANN to identify affected segments. Xiong et al. (46) performed analysis of the full myocardium to detect existence of at least one anatomically significant stenosis. Instead of the shape of the endocardial surface, features in this approach described the attenuation and wall thickness of myocardial segments.

### 4.2. Functional Significance

The sensitivity of CCTA-based anatomical stenosis evaluation for detection of functionally significant stenosis is high when evaluated visually, but its specificity is moderate (41). The current reference standard for determination of functional significance of a stenosis is given by its fractional flow reserve (FFR), i.e., the ratio of flow distal of the stenosis to the flow proximal of the stenosis. FFR is measured invasively by inserting a special catheter in the coronary artery under hyperemic conditions. FFR below 0.80 indicates need for intervention (59). Treatment based on invasive FFR measurements can improve patient outcomes (59), but measurement of FFR is still relatively uncommon, which is due to associated cost and risk, as well as lack of vasodilator drugs (60).

FFR estimation based on CCTA scans (FFRCT) could provide reproducible physical measurement without the drawbacks of invasive procedures. FFRCT has traditionally been based on computational fluid dynamics (CFD) (61, 62), i.e., numerical simulation of blood flow in a coronary tree model extracted from CCTA using lumen segmentation methods (section 4.1). These methods are accurate (63) but computationally expensive due to their iterative nature. This precludes their deployment on local workstations, and instead CFD simulations are typically performed on off-site dedicated systems. ML could be used to significantly speed up estimation of FFRCT.

Itu et al. (55) proposed an ANN model to predict an FFR value for each segment in the coronary artery tree, given local features based on the segment's geometry and global features based on the most severe stenoses. To train this model, a large data set of 12,000 synthetic coronary artery trees was generated and a reference standard was obtained through conventional CFD simulation. By only performing CFD simulations once in a training phase, the time required to perform FFRCT was reduced by two orders of magnitude. The diagnostic value of this method has been demonstrated thoroughly (64–76). Yu et al. (77) further demonstrated additional prognostic value of CT morphological index for the method proposed by Itu et al. (55). Wang et al. (50) proposed to use a recurrent ANN that can model long-range dependencies between segments.

Both conventional CFD-based FFRCT and the methods proposed in Wang et al. (50) and Itu et al. (55) are based only on the geometry of the coronary artery tree model, and are thus susceptible to errors by the segmentation method used to obtain this model (78). Instead, Dey et al. (52) proposed to combine geometric features with semi-automatically obtained plaque and attenuation gradient measurements to identify arteries with functionally significant stenosis. Other methods skip explicit coronary artery centerline extraction and lumen segmentation altogether. Kumamaru et al. (49) trained a CNN to extract a map showing the contrast-enhanced territories in CCTA and used this map in a classifier to predict the minimum FFR value in a patient. Alternatively, analysis can be moved from the cause the coronary arteries—to the effect, i.e., the myocardium. Han et al. (54) subdivided the separated endocardium and epicardium into the American Heart Association (AHA) 17 segments (58), with 3 features per segment characterizing perfusion and wall thickness. However, the trained boosting classifier showed only moderate accuracy for patientwise prediction of abnormal FFR values. For the same purpose, Zreik et al. (53) trained an SVM based on features from myocardial regions extracted from CCTA. Clinical evaluation of this method yielded improved diagnostic accuracy of FFRCT over visual evaluation of stenosis (79). Hae et al. (51) increased accuracy of FFR-prediction by including the tissue volume subtended to a stenotic lesion in analysis. However, determination of lesion position required additional analysis including artery tree segmentation.

### 5. DISCUSSION

We have presented a survey of applications of ML for detection, characterization and quantification of atherosclerotic plaque and stenosis in cardiac CT. We found that while ML has been a mainstay of cardiac image analysis for years, the recent emergence of deep learning has accelerated progress in the field. Machine learning has the potential to unburden clinicians from time-consuming tasks and change diagnostic procedures, thereby reducing healthcare costs. Moreover, low-cost ML-based analysis could be added to screening studies as a secondary goal. In this survey, we have focused on ML for CAD analysis. For a broader scope the reader is referred to Al'Aref et al. (80), Litjens et al. (81), Nicol et al. (82), Petersen et al. (83), and Singh et al. (84).

We have reviewed plaque and stenosis analysis methods in separate sections, but formation of plaque and stenosis is naturally related and many papers have proposed simultaneous analysis [e.g., (48, 53)]. Moreover, (semi-)automatic identification of plaque or stenosis is often only an intermediate step for prediction of cardiovascular events. Motwani et al. (85) used stenosis scores and plaque characteristics to develop a model for 5 years all-cause mortality prediction. Similarly, Johnson et al. (86) showed that an ML model taking into account per segment coronary artery characteristics can outperform hand-crafted models for prediction of adverse cardiac events. Van Rosendael et al. (87) developed a model for all-cause mortality prediction in combination with future myocardial infarction based only on hand-crafted features derived from CCTA scans. Furthermore, some methods directly predict presence of CAD from medical images, i.e., chest CT (88) or non-contrast-enhanced cardiac CT (89). While these approaches only require one label per patient and large data sets are thus not expensive to obtain, the interpretability of predictions may be limited. Interpretability might constitute an opportunity, not only to improve reliability but also as it might increase

### REFERENCES


medical knowledge by quantifying the diagnostic relevance of underlying phenomena.

The readiness of automatic analysis methods for clinical implementation depends on the complexity of the task, but also on other factors. ML algorithms require large training sets, and tasks with abundant data may be easier to automate. For example, obtaining a ground truth for e.g., non-calcified plaque characterization is very challenging. Therefore, data sets are generally small and ML algorithms remain at an early developmental stage. In contrast, large data sets are available for the development of ML-based CAC scoring methods, which has led to highly accurate results in both dedicated cardiac CT images (11) and other CT images visualizing the heart (8, 32). Similarly, ML-based FFRCT development is aided by the availability of large data sets with CFD-derived reference values. An important remaining step toward clinical application of FFRCT lies in performance evaluation specifically for subjects around the FFR threshold of 0.8, which were shown to be most challenging (90). Furthermore, a recent study showed that not all CCTA exams are suitable for FFRCT analysis (78).

Many challenges in the adoption of machine learning methods in the clinic are not exclusive to CAD detection in cardiac CT. For example, ML algorithms could show unexpected behavior, motivating research into ML interpretability and explainability (91). Furthermore, it is important to point out that ML algorithms are often trained and evaluated on single center studies with high risk for selective biases, and under exclusion of low quality scans.

Despite these challenges, current rapid development allows for justifiable hope that the importance of ML algorithms in cardiac CT will not cease to increase in near future, with benefits for clinicians and patients alike.

### AUTHOR CONTRIBUTIONS

NH and SV drafted the manuscript, which was critically revised and edited by JW, TL, and II. Authors agree to be accountable for all aspects of the work.

### FUNDING

This study has been financially supported by Pie Medical Imaging BV.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fcvm. 2019.00172/full#supplementary-material

as the first-line test for stable chest pain. Heart. (2018) 104:921–7. doi: 10.1136/heartjnl-2017-311846


**Conflict of Interest:** II and TL received institutional research projects by Dutch Technology Foundation cofunded by Pie Medical Imaging and Philips Healthcare (P15-26), the Netherlands Organisation for Health Research and Development with participation of Pie Medical Imaging (104003009). II received institutional research project by Dutch Technology Foundation cofunded by Pie Medical Imaging (12726). II and TL are cofounders and shareholders of Quantib-U BV.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Hampe, Wolterink, van Velzen, Leiner and Išgum. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Machine Learning Approaches for Myocardial Motion and Deformation Analysis

Nicolas Duchateau<sup>1</sup> \*, Andrew P. King<sup>2</sup> and Mathieu De Craene<sup>3</sup>

<sup>1</sup> CREATIS, CNRS UMR 5220, INSERM U1206, Université, Lyon, France, <sup>2</sup> School of Biomedical Engineering and Imaging Sciences, King's College London, London, United Kingdom, <sup>3</sup> Philips Research Paris, Suresnes, France

Information about myocardial motion and deformation is key to differentiate normal and abnormal conditions. With the advent of approaches relying on data rather than pre-conceived models, machine learning could either improve the robustness of motion quantification or reveal patterns of motion and deformation (rather than single parameters) that differentiate pathologies. We review machine learning strategies for extracting motion-related descriptors and analyzing such features among populations, keeping in mind constraints specific to the cardiac application.

#### Edited by:

Steffen Erhard Petersen, Queen Mary University of London, United Kingdom

#### Reviewed by:

Antonio De Marvao, Imperial College London, United Kingdom Wenjia Bai, Imperial College London, United Kingdom Alireza Sojoudi, University of Calgary, Canada

\*Correspondence: Nicolas Duchateau nicolas.duchateau@creatis.insa-lyon.fr

#### Specialty section:

This article was submitted to Cardiovascular Imaging, a section of the journal Frontiers in Cardiovascular Medicine

> Received: 14 October 2019 Accepted: 12 December 2019 Published: 09 January 2020

#### Citation:

Duchateau N, King AP and De Craene M (2020) Machine Learning Approaches for Myocardial Motion and Deformation Analysis. Front. Cardiovasc. Med. 6:190. doi: 10.3389/fcvm.2019.00190 Keywords: machine learning, computer-aided diagnosis, myocardial motion, myocardial strain, cardiac imaging

## 1. INTRODUCTION

### 1.1. Myocardial Motion and Deformation Analysis: What For?

Pump efficiency can discriminate failing from healthy hearts, as quantified by volume and ejection fraction. Clinicians are well aware of the limitations of these simple measurements to face the complexity of heart disease, and recommend finer markers of cardiac mechanical dysfunction (1). Myocardial motion (displacement or velocity) and deformation (strain or strain rate) are richer descriptors of (ab)normal cardiac function (2, 3). They can provide characteristic spatiotemporal signatures for disease at each location of the myocardium and each instant of the cardiac cycle. They are often projected onto anatomically-relevant directions to facilitate interpretations (4). Interestingly, they can be estimated from routine modalities such as echocardiography and magnetic resonance (MR) (5), and have therefore been thoroughly investigated for a wide range of applications.

### 1.2. Machine Learning for Myocardial Motion and Deformation Analysis: What For?

Machine learning builds upon models whose optimal parameters are learnt from a set of samples representative of the studied population. This data-driven approach is more flexible than traditional methods (e.g., variational), as demonstrated for myocardial segmentation (6, 7), and has strong potential for the analysis of complex descriptors such as myocardial motion and deformation. In essence, machine learning seeks to learn data representations (either explicit or hidden) for better solving a supervised problem or for characterizing the data distribution. This often involves dimensionality reduction to facilitate the analysis of high-dimensional descriptors, and requires navigating between the low-dimensional/latent space and high-dimensional/original space for better interpretation.

### 1.3. Which Data Approach for Learning?

Over the years, researchers have gained detailed knowledge of the complexity of cardiac mechanics, and proposed physiologicallyrelevant motion and deformation descriptors, from global strain in a single anatomical direction to richer representations such as 3D+t vector or tensor fields. Most approaches decompose the analysis into two steps (**Figure 1A**): the extraction of motion/deformation descriptors from image sequences, followed by their analysis over a population of interest. Machine learning can address both parts, and we discuss these topics separately (sections 2 and 3). Deep neural networks (8) may address the two parts in **Figure 1A**, but also enable the analysis of population data directly from the image sequences by looking for image features not necessarily interpretable or visualizable, but optimal to answer the clinical question of interest (**Figure 1B**). We specifically comment on this strategy, which is more recent and preliminary, in section 4.7.

## 2. MOTION AND DEFORMATION ESTIMATION

Traditionally, myocardial motion fields have been estimated from images using standard image registration techniques such as optical flow (9), free-form deformation (10), or block matching (11). Naturally, this depends on the algorithm ability to catch motion-related structures, which strongly varies with the imaging modality. Tags and speckles can directly be tracked within the myocardium in tagged MR and 2D/3D echography (within the limits of tag fading, speckles temporal consistency, and out-of-plane motion), contrary to cine MR where algorithms tend to approximate motion from endocardial/epicardial contour tracking. A dedicated review (5) details the standards for spatial and temporal resolution and the influence of imaging parameters on the estimation of myocardial deformation.

Approaches based on neural networks challenge the variational formulation of motion estimation, as shown on video image sequences with the FlowNet2 convolutional neural network (CNN) architecture (12) that focuses on optical flow. Similar approaches have been applied to cardiac imaging (13, 14), but raise several methodological questions. First, the generalization ability of the trained networks to estimate a wide range of deformations at multiple scales still needs to be verified. This is critical for specific disease traits of lower prevalence. Furthermore, robustness to a variety of routine clinical imaging conditions (different image qualities, fields of view, devices, etc.) needs to be established. Second, supervised CNN-based motion estimators such as FlowNet2 do not embed any regularization, and are therefore sensitive to imaging noise if it differs from the training database. This not the case for unsupervised approaches like (13), which use an intensity-based loss, combined with a regularization term as in classical image registration. Finally, motion features can boost segmentation performances (15–17), as looking at several frames improves the manual segmentation of physicians. Further details are given in a review dedicated to deep learning for motion estimation in medical imaging (18).

Statistical models learnt from data can act as regularizers for tracking algorithms. (19) used dictionary learning as a sparse basis for cardiac motion fields to feed the regularization. Within deep learning, auto-encoders can encode spatial transformations into a low-dimensional space and provide powerful projection and reconstruction operators to connect with the tracking in the original image space (20).

Additional constraints specific to the cardiac application can provide more plausible registration outputs, such as invertibility (the myocardium does not fold) and incompressibility, as investigated for the diffeomorphic LogDemons (21) and freeform-deformation algorithms (22). Temporal consistency has been enforced through 4D representations of motion (23, 24), for multiple pairwise transformations simultaneously (25), or for intra/inter-subject mappings (26). Motion and deformation estimation with machine learning should also consider these aspects for better consistency and robustness.

## 3. MOTION AND DEFORMATION ANALYSIS

### 3.1. Before the Analysis: Data Normalization

Cardiac image data often need to be normalized in terms of anatomy, frame rate or cycle phases, before any statistical or machine learning analysis.

Image sequences can be registered using a 4D transformation model based on e.g., free-form deformation (10) or demons (26). This approach quantifies the spatiotemporal differences between the image sequences, analyzed statistically afterwards through deformation-based morphometry methods.

Motion or deformation descriptors (or any other data) from a given individual can also be transported to a reference template (generally, a central case at end-diastole). This involves local reorientation of the motion/deformation fields (27, 28), adjusted to the addressed clinical question (29). Temporal differences between sequences can also be normalized by resampling before the motion extraction [e.g., piece-wise linear interpolation (30)]. Recent approaches transport the whole subject-specific trajectory instead of the descriptors of interest, with specific computational considerations (31, 32). Automatically estimating multiple templates across the sequence may also be well adapted to the cardiac circular/periodic dynamics (33).

In both strategies, existing data correspondences facilitate the normalization. Spatial alignment can rely on anatomical landmarks (apex, valve ring, etc.) or point-to-point correspondences obtained from model-based tracking of the anatomy. Temporal alignment can use physiologically-relevant instants, such as the maximum contraction (10) or QRS and valve events (28).

### 3.2. Learning From Motion and Deformation Data

Machine learning can benefit a wide range of clinical problems. Unsupervised approaches learn a data representation that uncovers useful insights into the data distribution, but without explicit reference to a particular clinical question. Clustering and dimensionality reduction techniques fall into this category. Supervised approaches train a model for a specific task, and labels/annotations are provided as supervision. For example, diagnosing disease may involve binary labels for supervision (disease/healthy) and the task would be to predict these labels from the motion data. The type of labels determines the task addressed by the model: categorical labels mean classification, whereas discrete or continuous labels imply regression. Supervised approaches also involve learning a (lower dimension) representation of the data that facilitates the classification/regression, but this representation can be formed in an unsupervised or supervised way, as described below.

### 3.2.1. Unsupervised Learning

Unsupervised motion and deformation analysis shares objectives with statistical atlases, regarding how to characterize variability across a population. Pioneering works directly applied a principal component analysis (PCA) on myocardial displacements at each spatiotemporal location (34) over a healthy population, later extended through the estimation of local abnormalities in the myocardial velocities of a given subject compared to a reference population (28, 35). However, these analyses consider each spatial location and temporal instant independently from the others. The statistical analysis can also consider the motion patterns over the entire cardiac cycle as high-dimensional objects, as simply demonstrated through a PCA on temporal strain traces concatenated over the heart segments (36, 37). This approach reminds earlier work on Active Appearance Motion Models (38), which statistically analyzed both displacement and image intensity information over the entire cardiac cycle.

More advanced strategies estimate a low-dimensional space that encodes the high-dimensional myocardial motion/deformation data and navigate through this space, although this requires specific care. Myocardial shapes across a population can be considered as originating from one or several references under the action of a transformation such as a diffeomorphic warping. In this case, the space of myocardial shapes is related to the (known) non-linear high-dimensional space of diffeomorphic transformations. This space is a manifold, and known tools exist to perform statistics on such transformations and therefore on myocardial shapes while preserving this data structure (39, 40). Myocardial motion/deformation patterns may also be considered as originating from a non-linear high-dimensional manifold, but in this case the manifold is unknown. Machine learning allows estimating this space from data, and can overcome the limitations of linear techniques such as PCA that ignore this known structure. A general framework (41) groups the vast variety of existing manifold learning techniques. A graph is built across high-dimensional samples to approximate the manifold, and diagonalization, and dimensionality reduction processes provide a low-dimensional space that encodes the data. Techniques generally differ on how input samples are related within the graph, either locally (e.g., distance between neighbors, or local structure variations expressed in the graph Laplacian) or globally (e.g., geodesic distance). These techniques improve the statistical analysis of myocardial motion and deformation patterns. They can represent the continuum of disease from normality while preserving the data structure (42). The unsupervised representation of populations is particularly interesting when existing labels are not fully trusted, as in heart failure with preserved ejection fraction (43, 44) or when a supervised formulation of the clinical problem is uncertain, such as outcome from cardiac resynchronization therapy (45).

Nonetheless, these techniques normally lack explicit mappings between the high-dimensional and low-dimensional spaces, which are typically approximated using out-of-sample reconstruction/regression (46) and are therefore inexact. Deep learning auto-encoders explicitly address this by simultaneously learning how to encode and decode high-dimensional data with a limited number of parameters while minimizing the reconstruction error. However, this also requires constraining the distribution of samples in the latent space so that a statistical analysis can still be performed on it afterwards, as in variational auto-encoders (47). These techniques are promising for the analysis of myocardial motion and deformation and start being used in cardiac imaging for segmentation (48, 49) or segmentation-based biomarkers (50).

#### 3.2.2. Supervised Learning

As noted above, designing a supervised learning model traditionally consists of two steps (**Figure 1A**). First, the input data are transformed to a new representation that facilitates the task performance. Second, a classification or regression model is trained to predict the label given the new representation. More recent techniques such as deep learning combine these two steps: the representation is learnt and optimized during the model training (**Figure 1B**). Below, we first summarize works using supervised learning in the traditional way and then we briefly review more recent deep learning approaches.

The new data representation can be estimated using knowledge of the labels (supervised way) or without such knowledge (unsupervised). In other words, although the final classification or regression model is supervised, the transformation to a new representation can be unsupervised. Examples include the dimensionality reduction methods reviewed in section 3.2.1, such as PCA (51–53) or non-linear manifold learning (53, 54). The use of hand-crafted features such as volumes/diameters/strains (55) and radius/thickness (56, 57) also falls into this category, although one could argue that knowledge of the task was also used to design these features. A supervised approach was taken in Dawes et al. (58), in which supervised PCA was used to find the principal components of displacement data related to survival.

Classification or regression come once the new representation is obtained. Many classification algorithms have been used, including support vector machines (SVM) (55, 59), random forests (55), variants of dictionary learning (59–61) and ridge logistic regression (57). Regression applications rely on svm (62) and multiscale kernel regression (54).

Recent research has increasingly focused on deep learning for both classification and regression from dynamic imaging data. In these approaches, the activations of intermediate network layers can stand as a transformed representation formed in a supervised way. Inputs to these models are commonly dynamic image intensity data, but segmentation data has also been used (63). For classification, variants of auto-encoders have been a common architecture choice. An auto-encoder is a deep learningbased dimensionality reduction technique, and classification can be performed in the low-dimensional latent space learnt without supervision (53), or in a supervised way by including classification accuracy into the loss function (48, 63, 64). Autoencoders are attractive as they allow examining the classification features in the original image space, leading to more interpretable analyses. CNNs have also been proposed for classification (65), and a challenge on automated diagnosis was recently organized (7). Regression tasks such as estimating volume and/or ejection fraction may also involve CNNs (66), as tested on the recent Kaggle Challenge data<sup>1</sup> . Variational auto-encoders have also been used to perform regression in the latent space (50).

A wide set of classification applications involved myocardial motion or deformation, including identifying abnormal wall motion (59, 61), predicting therapy response (67) and survival (58, 64), and diagnosing myocardial infarct (16, 60, 65, 68) or pathology (7, 48, 57, 63). Regression applications aimed at localizing myocardial infarct (54), grading myocardial motion defects (62), and estimating volumes (66).

Detecting some form of abnormality is a common theme for supervised learning applications, for which two main strategies exist. In the first one, the transformed representation only involves healthy subjects: the distribution of samples in the low-dimensional space therefore represents healthy variations, and subsequent subjects who fall away from the healthy distribution are considered abnormal, as investigated on myocardial velocities (28, 35) and shapes (69). The other strategy learns a low-dimensional representation from both healthy and pathological subjects, where supervised classification can be applied afterwards (70).

### 4. SPECIFICITIES OF THE CLINICAL CONTEXT

### 4.1. Physiological Consistency

Learning algorithms utilize a low-dimensional representation of the high-dimensional motion/deformation data, where the population variability is either rendered through diagonalization according to inter-subject distances, or correlated to labels of interest. Transforming to and from this representation involves interpolation between samples. Regularizing the lowdimensional space ensures smoother interpolation and generates new samples that are physiologically plausible (49, 71). In both of these works, the low-dimensional space produced by the encoding part of a CNN was regularized to map smoothly to a set of input shapes, labeled images, or slice locations. This notion of joint projection from the image and label space is also inherently present in more classical manifold learning techniques such as partial least squares. Similar notions need to be extended to motion fields, whilst mapping similar pathological conditions to close locations in the latent space.

### 4.2. Spatiotemporal Analysis

Most learning techniques consider high-dimensional inputs as high-dimensional column vectors or a set of patches, and disregard the spatiotemporal characteristics of motion and deformation. Few works explicitly addressed this issue for the statistical analysis of populations. A bilinear statistical model was used on cardiac shapes (72) to distinguish intersubject variations from individual heart dynamics. (73, 74) explicitly addressed the problem through spatiotemporal tensor decomposition. Duchateau et al. (75) tuned up the contributions of the spatial, temporal, and magnitude dimensions to analyze changes in deformation patterns through registration. Jia et al. (31) and Guigui et al. (32) transported temporal trajectories without explicitly extracting motion or deformation descriptors beforehand. These strategies, limited to variability analyses, pave the ground for better considering spatiotemporal aspects with machine learning.

<sup>1</sup>Available online at: https://www.kaggle.com/c/second-annual-data-sciencebowl/data

### 4.3. Interpretability

Many tasks may benefit from somehow "interpretable" learnt models, i.e., a user should have ways to inspect the input data characteristics that led to the output prediction or representation. The recent trend toward more complex learning models (such as deep learning) has raised the interest for this property, since these models are generally harder to interpret than simpler ones. One approach consists in defining a simpler model that is "locally similar" to the global complex model (i.e., it has similar performance for similar inputs) (76). For deep learning based approaches, "saliency maps" can be produced, which show which parts of the input data were important in producing the output. Alternatively, regression or autoencoders can be used to reconstruct cases from the low-dimensional latent space and examine features in the original-high dimensional space, with clear benefits for interpretability as demonstrated in Clough et al. (48), Puyol-Anton et al. (53), Biffi et al. (63), and Bello et al. (64).

machine learning for myocardial motion or deformation analysis.

### 4.4. Database Size and Heterogeneity

Traditionally, difficulties in accessing and reliably annotating databases of medical images have led to smaller databases in medical imaging compared to computer vision applications. Recent initiatives such as the UK Biobank project<sup>2</sup> (77) now provide large-scale annotated imaging databases, fuelling a rise in more data-intensive methods such as deep learning. **Figure 2** illustrates this high increase over recent years for the studies reported in this paper. The impact of these large databases is high: reporting reference ranges for cardiac functional biomarkers is now possible with much greater confidence (78, 79), in addition to detecting effects otherwise hidden with smaller databases, as shown for genome data (77). Data heterogeneity is also crucial when choosing or curating a database for a specific task, i.e., the database should include sufficient subjects to cover a

<sup>2</sup>Available online at: https://www.ukbiobank.ac.uk/

range of values for the output label and guarantee the model generalizability. More pathology-focused databases such as those in the Cardiac Atlas Project<sup>3</sup> (80) have an important role to play in this respect.

### 4.5. Validation and Standardization Initiatives

As analyzing the tracking output is sensitive to processing errors, in particular for multi-centric data, tracking algorithms should be benchmarked to prevent bias due to different manufacturers or settings/practices. To ensure reproducibility of clinical decisionmaking from these data, standardization initiatives arose from academic, clinical, and industrial actors of cardiac imaging. Strain estimation was compared across vendors for synthetic and real images (81). Outputs were consistent regarding the differentiation between pathological and healthy regions, and the identification of ambiguous zones. However, statistically significant differences among vendors were reported, including differences around 15% for the biggest scars. These differences call for benchmarks on more realistic datasets (both regarding geometry and image quality), obtained e.g., from simulation frameworks that mix image formation and biomechanical models with real images (82).

Complementary standardization of imaging are also investigated through deep learning, for the control of e.g., the full coverage of the ventricles (83), the view/plane (84, 85), and the image quality in general (78) or due to motion-related artifacts (86).

### 4.6. Multiple Modalities/Descriptors

Most studies only consider a single type of motion or deformation descriptor at once from a single acquisition and a single modality, unlike clinical reasoning, which repeats acquisitions in the same or different modality and uses different types of measurements and descriptors. Recent works addressed these limitations within the framework of manifold learning. (30) enforced the complementarity of multimodal acquisitions (tagged MR and 3D echocardiography) using canonical correlation analysis and partial least squares methods. (87) used a similar strategy to better relate myocardial shape and deformation descriptors. Puyol-Anton et al. (70) investigated multi-view linear discriminant analysis for classification purposes. Finally, the more generic framework of multiple kernel learning allows reducing the dimensionality and examining the weights attributed to each descriptor. It was applied to supervised (67) and unsupervised (43–45, 88) problems, to investigate multiple descriptors among which motion-based ones, which could come from different modalities or different views of a single modality.

### 4.7. Complexity of the Models and Data Descriptors

Machine learning relies on models whose complexities should be adjusted to the question being answered. Researchers should keep in mind that such models only provide an approximation of reality, and try to minimize this error (e.g., by refining the model, adding more data or relevant descriptors, or estimating uncertainties). We strongly recommend to start with simple data descriptors and models, and carefully benchmark the retained methods against simpler models or even standard statistics.

Deep learning approaches allow circumventing the design of hand-crafted features (**Figure 1B**), and therefore go beyond a substantial limitation of standard machine learning. They mainly have been used for supervised problems and avoiding segmentation. The ACDC challenge (7) included a diagnosis challenge not necessarily requiring segmentation, although all participants opted for segmentation-based diagnosis. Regressionbased estimation of cardiac parameters directly from images was proposed in (66, 89, 90), and may also strengthen the segmentation-based estimation of such parameters (91). However, as already pointed out, this direct strategy may also limit interpretability, and therefore transfer to clinical practice.

## 5. CONCLUSION

Machine learning offers wide possibilities to automate processing, and notably extract and analyze myocardial motion and deformation. Driven by advances in cardiac segmentation and large databases collection, there is potential for substantially improving the characterization of the cardiac function and impacting clinical practice. Changes cover the automation of time-consuming and user-dependent tasks such as feature extraction, higher performance on supervised problems such as (earlier) diagnosis, prognosis, and risk stratification, and new unsupervised data representations for knowledge discovery such as clustering or phenotyping. Nonetheless, motion and deformation are rich but complex high-dimensional data. Efforts need to be continued to reduce uncertainties, approximations, and crucial misinterpretations along the analysis pipeline, from careful problem definition, compliance with the mathematical and physiological data properties, algorithms benchmarking/validation/testing, and health actors' education.

### AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

### FUNDING

ND was partially supported by the LABEX PRIMES of Université de Lyon (ANR-11-LABX-0063) and the MIC-MAC JCJC project (ANR-19-CE45-0005). AK was partially supported by the Wellcome EPSRC Centre for Medical Engineering at Kings College London (WT 203148/Z/16/Z) and the UK EPSRC (EP/P001009/1 and EP/R005516/1). MD was partially supported by the European Union Horizon 2020 Programme for Research and Innovation (CardioFunXion, H2020-14-MSCA-ITN-642676).

<sup>3</sup>Available online at: https://www.cardiacatlas.org/

### ACKNOWLEDGMENTS

The articles discussed in this review were selected by querying PubMed over the last 10 years with the terms (myocardial [OR] cardiac) [AND] learning [AND] (motion [OR] deformation), complemented by the authors' knowledge, and examining the

### REFERENCES


publication profile of the authors of the already selected articles. Papers not using spatial or temporal motion or deformation patterns but single measurements such as peak values or timings, and papers addressing cardiac respiratory motion were removed from this selection, although we acknowledge their importance for the more complete analysis of cardiac function.

Workshop, MIL3ID 2019, Shenzhen, Held in Conjunction with MICCAI 2019. Shenzhen (2018) 11071:472–80. doi: 10.1007/978-3-030-00934-2-53


from MR and ultrasound data. Med Image Anal. (2017) 40:96–110. doi: 10.1016/j.media.2017.06.002


**Conflict of Interest:** MD was employed by Philips Research Paris. ND and AK have research publications and/or projects with researchers and engineers from private companies, but this did not influence the contents of this review.

Copyright © 2020 Duchateau, King and De Craene. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Artificial Intelligence for Cardiac Imaging-Genetics Research

#### Antonio de Marvao† , Timothy J. W. Dawes † and Declan P. O'Regan\*

*MRC London Institute of Medical Sciences, Imperial College London, London, United Kingdom*

Cardiovascular conditions remain the leading cause of mortality and morbidity worldwide, with genotype being a significant influence on disease risk. Cardiac imaging-genetics aims to identify and characterize the genetic variants that influence functional, physiological, and anatomical phenotypes derived from cardiovascular imaging. High-throughput DNA sequencing and genotyping have greatly accelerated genetic discovery, making variant interpretation one of the key challenges in contemporary clinical genetics. Heterogeneous, low-fidelity phenotyping and difficulties integrating and then analyzing large-scale genetic, imaging and clinical datasets using traditional statistical approaches have impeded process. Artificial intelligence (AI) methods, such as deep learning, are particularly suited to tackle the challenges of scalability and high dimensionality of data and show promise in the field of cardiac imaging-genetics. Here we review the current state of AI as applied to imaging-genetics research and discuss outstanding methodological challenges, as the field moves from pilot studies to mainstream applications, from one dimensional global descriptors to high-resolution models of whole-organ shape and function, from univariate to multivariate analysis and from candidate gene to genome-wide approaches. Finally, we consider the future directions and prospects of AI imaging-genetics for ultimately helping understand the genetic and environmental underpinnings of cardiovascular health and disease.

Keywords: artificial intelligence, machine learning, deep learning, genetics, genomics, imaging-genetics, cardiovascular imaging, cardiology

### INTRODUCTION

Cardiovascular conditions remain the leading cause of mortality and morbidity worldwide (1), with genetic factors playing a significant role in conferring risk for disease (2). High-throughput DNA sequencing and genotyping technologies, such as whole-genome sequencing and high-resolution array genotyping, have developed at an extraordinary pace since the first draft of the human genome was published in 2001 at a cost of \$0.5-1 billion (3). Continuous improvements have so far outpaced Moore's law, with the sequencing cost per genome currently estimated to be \$1,000 (4), enabling cost-effective sequencing of millions of humans. At the same time, technological advances in physics, engineering, and computing have enabled a step-change improvement in cardiovascular imaging, facilitating the shift from one dimensional, low-fidelity descriptors of the cardiovascular system to high-resolution multi-parametric phenotyping. These capabilities are not limited to research settings but are increasingly available in clinical echocardiography, nuclear imaging, computerized tomography (CT), and cardiovascular magnetic resonance (CMR) practice. An unprecedented volume of clinical data is also becoming available, from smartphone-linked wearable sensors (5) to the numerous variables included in the electronic health records of entire

#### Edited by:

*Steffen Erhard Petersen, Queen Mary University of London, United Kingdom*

#### Reviewed by:

*Alexander Teumer, University of Greifswald, Germany Avan Suinesiaputra, The University of Auckland, New Zealand*

#### \*Correspondence:

*Declan P. O'Regan declan.oregan@imperial.ac.uk*

*†These authors have contributed equally to this work and share first authorship*

#### Specialty section:

*This article was submitted to Cardiovascular Imaging, a section of the journal Frontiers in Cardiovascular Medicine*

Received: *30 September 2019* Accepted: *27 December 2019* Published: *21 January 2020*

#### Citation:

*de Marvao A, Dawes TJW and O'Regan DP (2020) Artificial Intelligence for Cardiac Imaging-Genetics Research. Front. Cardiovasc. Med. 6:195. doi: 10.3389/fcvm.2019.00195* populations (6). However, the volume, heterogeneity, complexity, and speed of accumulation of these datasets now make humandriven analysis impractical. Artificial intelligence (AI) methods such as machine learning (ML), are particularly suited to tackling the challenges of "Big Data" and have shown great promise in addressing complex classification, clustering, and predictive modeling tasks in cardiovascular research. Cardiac imaginggenetics refers to the integrated research methods that aim to identify and characterize the genetic variants that influence functional, physiological, and anatomical phenotypes derived from cardiovascular imaging.

In the same way that basic statistical literacy has become a routine aspect of clinical practice, a basic understanding of AI's strengths, applications, and limitations is becoming essential for practicing researchers and clinicians. Here we introduce common AI principles, review applications in imaging-genetics research, and discuss future directions and prospects in this field.

### IMAGING-GENETICS: FROM SINGLE GENE HYPOTHESIS-TESTING TO GENOME-WIDE HYPOTHESES GENERATION

Imaging-genetics aims to dissect and characterize the complex interplay between imaging-derived phenotypes and environmental and genetic factors. Many principles and approaches originated from neuroimaging research, where the first attempts at integrating multi-parametric phenotypes, obtained from structural and functional brain MRI, with genetic data were carried out (7). To help manage the computational and statistical challenges inherent to the use of "Big Data" squared (high-dimensional imaging × high-dimensional genetic data), interrogations were limited to pre-defined regions of interest in the brain and candidate genes or SNPs, based on a priori assumptions about the biology of disease (8). Similar, "hypothesis-led" designs underpinned candidate gene and linkage studies that established causal relationships between rare genetic variants and rare conditions, such as those that first identified the role of myosin heavy-chain beta in hypertrophic cardiomyopathy (HCM) (9) and of titin in dilated cardiomyopathy (DCM) (10).

The increased affordability of DNA sequencing and genotyping resulted in genetic information becoming available in large numbers of subjects. This has contributed to shift the focus to genetic discovery and the study of common, complex disease traits. These traits are not characterized by a single gene mutation leading to a large change on the phenotype but attributable to the cumulative effects of many loci. Although the effect sizes of individual loci are relatively modest, composite effects can significantly alter the probability of developing disease (11). The "common disease—common variant" hypothesis underpins genome wide association studies (GWAS), where subjects are genotyped for hundreds of thousands of common variants. For example, a study into the genetic determinants of hypertension in over 1 million subjects, identified 901 loci that were associated with systolic blood pressure (SBP) and these explained 5.7% of the variance observed (12). Even though these single nucleotide polymorphisms (SNPs) explain only a small proportion of phenotypic variance they provide relevant, hypothesis-generating biological or therapeutic insights. The rapid development of complementary high-throughput technologies, able to characterize the transcriptome, epigenome, proteome, and metabolome now enables us to search for molecular evidence of gene causality and to understand the mechanisms and pathways involved in health and disease (13). These large biological multi-omics data sets and their computational analysis are conceptually similar to the more established study of genomics and examples of such work are included in this review.

### IMAGING-GENETICS: FROM ONE-DIMENSIONAL PHENOTYPING TO MULTIPARAMETRIC IMAGING

Several biological and technical reasons have been proposed to explain the "missing heritability" of complex cardiovascular traits. However, a common factor limiting many genotypephenotype studies was that the ability to characterize phenotypes rapidly and accurately, significantly lagged behind our ability to describe the human genotype (14). Phenotyping was characterized by imprecise quantification, sparsity of measurements, high intra- and inter- observer variability, low signal to noise ratios, reliance on geometric assumptions, and adequate body habitus, poor standardization of measurement techniques and the tendency to discretize continuous phenotypes (15). Commonly, the complexity of the cardiovascular system was distilled into a small number of continuous one-dimensional variables [e.g. volumetric assessment of the left ventricle (16)] or, convenient dichotomies, such as responders vs. non-responders (17), leading to a loss of statistical power (18).

The imaging community responded to calls for more accurate and precise, high-dimensional phenotyping (19, 20) with the roll out of developments in echocardiography (e.g., tissue doppler, speckle-tracking, and 3D imaging), CMR (e.g., tissue characterization, 4D flow, 3D imaging, diffusion tensor imaging, spectroscopy, and real-time scanning), CT (e.g., improved spatial and temporal resolution, radiation dose reduction techniques, functional assessment of coronary artery flow using FFR-CT, and coronary plaque characterization), and nuclear cardiology (e.g., improvements in radiopharmaceuticals and hardware resulting in increased accuracy and reduced radiation exposure). In parallel, computational approaches have become increasingly integral to the clinical interpretation of these much larger datasets (21–23) and several have obtained FDA approval (24).

### IMAGING-GENETICS: A "BIG DATA" SQUARED PROBLEM

Leveraging these deeper phenotypes is an attractive proposition but the joint analysis of high-dimensional imaging and genetic data poses major computational and theoretical challenges. An early example of a neuroimaging GWAS investigated the association between 448,293 SNPs and 31,622 CMR voxels in a cohort of 740 subjects (25). This study highlighted difficulties correcting for multiple testing (1.4 × 10<sup>10</sup> tests were performed) and the need for unprecedented computational power (300 parallel cores).

Simultaneously assessing the statistical significance of several hundred thousand tests vastly increases the number of anticipated type I errors. If the probability of incorrectly rejecting the null hypothesis in one test with a pre-set α of 0.05 is 5%, then under the same conditions, the probability of incorrectly rejecting the null hypothesis at least once if 100 tests are performed is 99.4%. Therefore, an adjustment for the number of tests being carried out is required. The simplest approach for adjustment for multiple testing is the Bonferroni correction, where the pre-set α is recalculated as α/m, where m represents the number of independent tests being performed. However, this method is overly conservative when m is large, leading instead to many false negatives. An alternative, extensivelyvalidated method is the Benjamini–Hochberg Procedure (26). Using this approach, instead of controlling for the chance of any false positives, an acceptable maximum fixed percentage of false discoveries (the expected proportion of rejected hypotheses that are false positives) is set.

A further consideration in the statistical analysis of highdimensional cardiac phenotypes is that a clinically significant signal will not originate from a single voxel but across many voxels in extended, anatomically coherent areas. Indeed, approaches such as threshold-free cluster enhancement (TFCE), which were developed in neuroimaging (27), have recently applied in cardiovascular research (28). Using such methods, both signal size and contiguity with surrounding signal patterns contribute to inference statistics.

### ARTIFICIAL INTELLIGENCE

Artificial intelligence, machine learning, and deep learning are terms that are interlinked, have some overlap but are often incorrectly used interchangeably. AI refers to the overarching field of computer science focused on simulating human cognitive processes. As a subset of AI, machine learning refers to the family of algorithms that share a capacity to perform tasks like classification, regression, or clustering based on patterns or rules iteratively learnt directly from the data without using explicit instructions. ML algorithms can be further subdivided into supervised, unsupervised, and reinforcement learning.

Supervised learning is the most common form of traditional ML and involves the training of models on pairs of input and expected outputs ("labeled" data) and then their deployment to make predictions in previously unseen data. It includes such approaches as nearest neighbor, support vector machines, random forests and naïve Bayes classifiers. Unsupervised learning algorithms are used to address clustering or dimensionality reduction problems by detecting patterns and structures within the data without any prior knowledge or constraints. In other words, the model organizes "unlabeled" data into groupings that share common, previously undefined characteristics. Examples including k-means clustering, tdistributed stochastic neighbor embedding (t-SNE), and association rule learning algorithms. The use of reinforcement learning algorithms (e.g., deep Q networks), common in robotics and gaming applications (29) has now also been trialed in the navigation of 3D datasets for anatomical landmark detection (30).

Deep learning (DL) is a specific ML method inspired by the way that the human brain processes data and draws conclusions. To achieve this, DL applications use a layered structure of algorithms, called an artificial neural network that imitates the biological neural network of the human brain. The word "deep" in "deep learning" refers to the number of layers through which the data is transformed. The most common DL models are convolutional neural networks (CNN), which are extremely efficient at extracting features and often superior to traditional ML in larger, more complex datasets such as medical imaging and genomics (31, 32). However, feature and process interpretability is more amenable in classical ML as even simple DL networks can operate as "black-boxes." While the computational and time requirements of DL are much higher during training, subsequent inference is extremely fast and DL approaches can be used to accelerate supervised, unsupervised, and reinforcement learning. Indeed, while traditional ML is carried out using central processing units (CPUs), DL was only made possible thanks to the development of graphics processing units (GPUs), which have a massively parallel architecture consisting of thousands of cores and were designed to handle vast numbers of tasks simultaneously.

During the training stage of supervised learning algorithms, the labeled data is divided into training, validation, and testing subsets to reduce overfitting and estimate how well the models generalize. No standard methodologies exist to determine optimum proportions allocated to each set. The training set usually includes a large proportion of the available data and is used for the development of the model. The validation set is used to estimate overall model performance during development and fine-tune the algorithm's hyperparameters (e.g., the number of network layers which could not be learnt). Dividing data into training and validation subsets can be done randomly at the onset of the process or by using a cross-validation approach. This involves dividing the entire dataset into folds of equal size and then training the algorithms in all the folds except one that is left out for validation. The process is repeated until all folds have been used as a validation set and the overall performance of the model is calculated as the average across all validation sets. Finally, an independent (ideally external) test set should be used to assess the model's generalizability.

Despite ML's vast potential and significant performance breakthroughs in fields such as speech recognition, natural language processing, and computer vision, these approaches are not without limitations and vulnerabilities. Some of these are shared with classical statistical approaches (33) while others are entirely novel (34). A significant potential pitfall of ML models derives from the presence of unrecognized confounders that can be present in both the training and testing sets, if they originated from the same dataset. This could result in overfitting of the model to the training data, achieving an artificially inflated performance with poor generalization to other data sets in subsequent studies. The gold-standard approach to address this issue is to obtain a validation dataset acquired by an independent group under real-world conditions. Another possible cause of unsatisfactory generalization of an AI system is if the training data is not an accurate representation of the wider population. For example, an AI model trained on a healthy cohort may not generalize well to a general population that includes extreme disease phenotypes, and a system trained on images from a specific CMR scanner might not perform well when labeling images acquired under different technical conditions. Domain adaptation or transfer learning are fields of AI research that aim to address these challenges.

AI algorithms can also be oversensitive to changes in the input data and therefore vulnerable to unintentional or harmful interference. This was clearly demonstrated in experiments involving "adversarial examples" or inputs that lead the model to make a classification error. For example, the introduction of an imperceptible perturbation in a picture of a benign skin mole resulted in the misclassification as a malignant mole, with 100% confidence (35). The general application of AI has also been hindered by the "black-box" nature of several methodologies. Indeed, full clinical acceptability is only likely if it is possible to explore and scrutinize the predictive features and if the outputs are clinically interpretable.

At a more fundamental level, "Big Data" studies are often no more than observational research. As in classical statistics, observational AI studies cannot test causality and should therefore be considered hypothesis-generating that require further testing. A recent systematic review and meta-analysis of 82 studies applying DL methods to medical imaging found that although the diagnostic performance of DL methods was often reported as equivalent to human experts, few studies tested human vs. DL performance on the same sample and then went on to externally validate their findings (36). Furthermore, apart from a handful of exceptions (37), the effect of AI in routine clinical practice has been rarely tested in the setting of randomized controlled trials. Indeed, it has not been systematically demonstrated that the roll out of AI into clinical practice leads to an improvement in the quality of care, increased efficiency or improved patient outcomes (38). These studies will be required before this technology can be routinely used to help guide clinical care.

**Table 1** provides an introduction to some of the technical and methodological aspects that should be considered in AI research.

Nevertheless, the use of machine learning methods in cardiovascular research has grown exponentially over recent years, with an ever increasing set of uses and applications. Traditional supervised ML methods have been applied successfully to classification tasks in extremely diverse input data, ranging from discrimination between sequences underlying Cis-regulatory elements from random genome sequences (39), separation of human induced pluripotent stem cell-derived cardiomyocytes of distinct genetic cardiac diseases (CPVT, LQT, HCM) (40) to numerous applications in medical imaging analysis. Examples of this include automated



quality control during CMR acquisition (41), high-resolution CMR study of cardiac remodeling in hypertension (42) and aortic stenosis (43), and echocardiographic differentiation of restrictive cardiomyopathy from constrictive pericarditis (44). Unsupervised ML analysis have provided new unbiased insights into cardiovascular pathologies such as by establishing subsets of patients likely to benefit from cardiac resynchronization therapy (45) and by agnostic identification of echocardiography derived patterns in patients with heart failure with preserved ejection fraction and controls (46). Traditional ML has also been used for prediction of outcomes such as hospital readmission due to heart failure (47), survival in pulmonary hypertension (48), and population-based cardiovascular risk prediction (49).

More recently, there has been a greater interest in DL approaches, which have been used with great promise in ever larger-scale classification tasks. Applications include the analysis of CMRs (50), echocardiograms (51), and electrocardiograms (52), identification of the manufacturer of a pacemaker from a chest radiograph (53), aortic pressure waveform analysis during coronary angiography (54); automated categorization of HCM and healthy CMRs (55) and detection of atrial fibrillation using smartwatches (56). DL has also been successfully used to address complex survival prediction tasks in pulmonary hypertension (57) and heart transplantation (58).

The analysis of ever larger and complex genome-scale biological datasets is also particularly suited to ML approaches. One of the strengths of these approaches comes from the ability to discover unknown structures in the data and to derive predictive models without requiring a priori assumptions about, frequently poorly understood, underlying biological mechanisms (59). The field is large, diverse and fast moving with new opportunities for AI to synthesize data and optimize the prediction of key functional biological features appearing all the time. Applications of traditional ML have ranged from the prediction of quantitative (growth) phenotypes from genetic data (60), to the identification of proteomic biomarkers of disease (61), to the prediction of metabolomes from gene expression (62). As in cardiology research, there has been growing interest in applying DL to the field of functional genomics. Such approaches have been used to predict sequence specificities of DNA- and RNA-binding proteins (31, 63), transcriptional enhancers (64) and splicing patterns (65) and to identify the functional effects of non-coding variants (66, 67). A more in depth discussion of the applications of ML and DL to genomics and other multi-omics data can be found elsewhere (68–71).

### ARTIFICIAL INTELLIGENCE IN CARDIOVASCULAR IMAGING-GENETICS

Despite the parallel successes of AI in the fields of genetics and imaging analysis, integrated imaging-genetics research is still an emerging field. However, several studies have already demonstrated the usefulness of AI tools in the analysis of large biological, imaging, and environmental data, in such tasks as dimensionality reduction and feature selection, speech recognition, clustering, image segmentation, natural language processing, variable classification, and outcome prediction (**Figure 1**).

To predict which dilated cardiomyopathy patients responded to immunoglobulin G substitution (IA/IgG) therapy, as assessed by echocardiography, two supervised ML approaches, a random forest analysis and a support vector machine algorithm, were used independently on gene expression data derived from 48 endomyocardial biopsies (72). The overlapping set of 4 genes that was identified by both ML approaches was superior to clinical parameters in discriminating between responders and nonresponders to therapy. The prediction performance was further improved by adding data on the negative inotropic activity (NIA) of antibodies. A support vector machine classifier, also proved to be extremely helpful in identifying specific proteomic signatures that accurately discriminated between patients with heart failure with reduced ejection fraction (HFrEF) and controls in the absence (73) or presence of chronic kidney disease (74). ML pipelines also often use feature selection to more efficiently process high dimensional phenotypes, distinguishing the most informative features from those that are redundant. For example, an information gain method was used to identify speckle-tracking features able to differentiate athlete's heart from HCM. The combination of three different supervised machine learning algorithms (support-vector machine, random forest, and neural network) trained on this sparser data was then shown to be better at distinguishing the two types of remodeling (ML model sensitivity = 87%; specificity = 82%) than conventional echocardiographic parameters (best parameter was e'—sensitivity = 84%; specificity = 74%) (75).

ML approaches have also been successfully used in the identification of new, useful structures in data. One such study, using a hypothesis-free unsupervised clustering approach, revealed four distinct proteomic signatures with differing clinical risk and survival in patients with pulmonary arterial hypertension (76). ML has similarly been able to identify new sub-phenotypes in heart failure with preserved ejection fraction, classifying subjects into three subgroups associated with distinct clinical, biomarker, hemodynamic, and structural groups with markedly different outcomes (77). Okser et al. used a naïve Bayes classifier in a longitudinal imaging-genetics study of 1,027 young adults to identify a predictive relationship between genotypic variation and early signs of atherosclerosis, as assessed by carotid artery intima-media thickness, which could not be explained by conventional cardiovascular risk factors (78).

Classification problems, such as pixel-wise classification of CMR images, are also particularly suited to supervised classical ML (79, 80) and deep learning approaches (81). These highresolution representations of whole-heart shape and function can encode multiple phenotypes, such as wall thickness or strain, at each of thousands of points in the model (82). Such high-fidelity models were used in a study aiming to clarify the physiological role of titin-truncating variants (TTNtv), known to be a common cause of DCM but surprisingly also present in ∼1% of the general population (83). Mass univariate analyses, adjusted for multiple clinical variables and multiple testing, were carried out at over 40,000 points of a statistical parametric map of 1,409 healthy volunteers. This identified an association between TTNtv

positive status and eccentric remodeling, indicating a previously unproven physiological effect of these variants in subjects without DCM. A similar phenotyping approach was used by Attard et al. in 312 patients to elucidate the physiological mechanisms that underpinned reported association between certain metabolites and survival in patients with pulmonary hypertension (84). Univariate regression models including clinical, hemodynamic, and metabolic data were fitted at each vertices of a 3D cardiac mesh. These showed coherent associations between 6 metabolites and right ventricular adaptation to pulmonary hypertension as well as showing that wall stress was an independent predictor of all-cause mortality.

ML algorithms have also shown promise in predicting outcomes, such as imaging surrogates of disease or response to treatment, from complex sets of clinical and genetic variables. For example, to predict the presence or absence of coronary plaques on CT coronary angiography, a gradient boosting classifier was trained on a proteomic assay and identified two distinct protein signatures (85). A subset of these was found to outperform generally available clinical characteristics in the prediction of patients with high risk plaques (AUC = 0.79 vs. AUC = 0.65), while a distinct set outperformed clinical variables in predicting absence of coronary disease (AUC = 0.85 vs. AUC = 0.70). In another study, a combination of random forest and neural network methods were used first to identify the most informative subset of clinical and genomic data and then to predict coronary artery calcium (86). Interestingly, the model trained on SNP data only was highly predictive (AUC = 0.85), and better than models trained on clinical data (AUC = 0.61) and on a combination of genomic and clinical data (AUC = 0.83). Further validation experiments in patients with less severe coronary artery calcium showed poor predictive accuracy suggesting that the models' predictive value is limited to a range of (high) coronary calcium or that the models do not generalize well in the broader population. Schmitz et al. investigated the performance of 15 different supervised machine learning algorithms in predicting positive cardiac remodeling in patients that underwent cardiac resynchronization therapy (CRT) from clinical and genomic data (87). Several of the approaches demonstrated clear overfitting (accuracy ∼100%), while the algorithm that was identified as the most useful had a fair performance (accuracy = 83%) in addition to high transparency (predictive features easily identified).

Novel deep learning methods are also starting to make an impact in the imaging-genetics field by enabling unprecedented high-throughput image analysis. For example, DL methods have been able to achieve fully automated analysis of CMRs with a performance that is similar to human experts (88) and permitted the rapid segmentation of 17,000 CMRs that were then used in a GWAS (89). This identified multiple genetic loci and several candidate genes associated with LV remodeling, and enabled the computing of a polygenic risk score (PRS) that was predictive of heart failure in a validation sample of nearly 230,000 subjects (odds ratio 1.41, 95% CI 1.26 – 1.58, for the top quintile vs. the bottom quintile of the LV end-systolic volume).

While the use of AI in cardiovascular imaging-genetics has great potential, the limitations and challenges of AI in genetics (90) and imaging (91) are further amplified by combining these very large data. To date, no methodological approaches have been able to include whole-genome and high-resolution wholeheart phenotypes, without requiring extensive dimensionality reduction, filtering and/or feature selection, possibly introducing errors or biases to the input data. Even when this challenge is dealt with, multiple testing correction will continue to be problematic, with the potential for false positive findings likely to only be reliably addressed with replication studies. In AI imaging-genetics, no single method is universally applicable, and the choice of whether and how to use ML or DL approaches will remain task, researcher and population specific, creating difficulties in the pooling of data and meta-analyses. It should not be forgotten that conventional analysis remains valid and has advantages when data are scarce or if the aim is to assess statistical significance, which is currently difficult using deep learning methods. Issues related to the lack of interpretability ("black box") of some ML algorithms are less of an issue in imaging analysis, where accuracy of analysis can be visually verified, but very relevant to integrated imaginggenetics analysis or risk prediction, where identifying and explaining the features driving the algorithm's output can be virtually impossible. The tendency to over-fit models to training datasets risks reduction in the performance of the model when applied to new populations. These problems are likely to be exacerbated if new test datasets include subjects with differing genetic or physiological backgrounds, data were acquired using different technical conditions (e.g., different scanners or different genotyping batches) or if the quality of data acquired in the research setting significantly differs from real world data sets. Finally, issues regarding privacy, ownership, and consent over vast amounts of genetic and imaging data and legal and ethical considerations for clinicians using integrated imaging-genetics algorithms will become an ever more relevant topic of debate.

Although the application of AI to imaging genetics-research is still new, these promising methods and findings warrant further extensive validation in independent populations. Fully integrated, end-to-end, imaging-genetics DL approaches are theoretically extremely attractive but as yet untested. To confidently implement AI methods in research and clinical practice, challenges regarding standardization of data acquisition and algorithm development and reporting still need to be overcome. Initiatives such as adapting the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) recommendations (92) to machine learning research [TRIPOD-ML (93)] are very much welcome. Ultimately, the additive value of AI-driven decision making may

REFERENCES


require robust multi-center studies and randomized controlled trials (94, 95).

### FUTURE PERSPECTIVES

The development of body imaging, the elucidation of inheritance and genetics and the application of statistics to medicine were some of the most important medical developments of the past millennium (96). AI now provides an unrivaled ability to integrate these three aspects in imaging-genetics studies of unprecedented scale and complexity. The increasing variety and capabilities of ML tools at the disposal of researchers provide a powerful platform to agnostically revisit classical definitions of disease, to more accurately predict outcomes and to vastly improve our understanding of the genetic and environmental underpinnings of cardiovascular health and pathology. ML approaches will play an increasing role in every field of cardiovascular research, from genomic discovery and deep phenotyping, to mechanistic studies and drug development. Concerted efforts to improve AI study design, reporting, and collaborative validation will greatly contribute to deliver on the great promise of AI and ultimately improve patient care.

### AUTHOR CONTRIBUTIONS

AM, TD, and DO'R contributed to the content and writing of this manuscript.

### FUNDING

AM, TD, and DO'R research was supported by the British Heart Foundation (RG/19/6/34387, NH/17/1/32725, and RE/13/4/30184); the National Institute for Health Research Biomedical Research Centre based at Imperial College Healthcare NHS Trust and Imperial College London; and the Medical Research Council, UK. AM acknowledges additional support from the Academy of Medical Sciences (SGL015/1006) and a Mason Medical Research Trust grant.

### ACKNOWLEDGMENTS

The authors would like to thank Drs. Wenjia Bai and Carlo Biffi (Department of Computing, Imperial College London, London, UK) for their critical review of this article.


a critical review of progress and potential. Biol Psychiatry. (2017) 82:165–75. doi: 10.1016/j.biopsych.2016.12.030


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 de Marvao, Dawes and O'Regan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Image-Based Cardiac Diagnosis With Machine Learning: A Review

Carlos Martin-Isla<sup>1</sup> \*, Victor M. Campello<sup>1</sup> , Cristian Izquierdo<sup>1</sup> , Zahra Raisi-Estabragh2,3 , Bettina Baeßler <sup>4</sup> , Steffen E. Petersen2,3 and Karim Lekadir <sup>1</sup>

<sup>1</sup> Departament de Matemàtiques & Informàtica, Universitat de Barcelona, Barcelona, Spain, <sup>2</sup> Barts Heart Centre, Barts Health NHS Trust, London, United Kingdom, <sup>3</sup> William Harvey Research Institute, Queen Mary University of London, London, United Kingdom, <sup>4</sup> Department of Diagnostic & Interventional Radiology, University Hospital Zurich, Zurich, Switzerland

Cardiac imaging plays an important role in the diagnosis of cardiovascular disease (CVD). Until now, its role has been limited to visual and quantitative assessment of cardiac structure and function. However, with the advent of big data and machine learning, new opportunities are emerging to build artificial intelligence tools that will directly assist the clinician in the diagnosis of CVDs. This paper presents a thorough review of recent works in this field and provide the reader with a detailed presentation of the machine learning methods that can be further exploited to enable more automated, precise and early diagnosis of most CVDs.

#### Edited by:

Sebastian Kelle, Deutsches Herzzentrum Berlin, Germany

#### Reviewed by:

Joao Bicho Augusto, Barts Heart Centre, United Kingdom John Hoe, MediRad Associates Ltd, Singapore

> \*Correspondence: Carlos Martin-Isla carlos.martinisla@ub.edu

#### Specialty section:

This article was submitted to Cardiovascular Imaging, a section of the journal Frontiers in Cardiovascular Medicine

> Received: 01 November 2019 Accepted: 06 January 2020 Published: 24 January 2020

#### Citation:

Martin-Isla C, Campello VM, Izquierdo C, Raisi-Estabragh Z, Baeßler B, Petersen SE and Lekadir K (2020) Image-Based Cardiac Diagnosis With Machine Learning: A Review. Front. Cardiovasc. Med. 7:1. doi: 10.3389/fcvm.2020.00001 Keywords: cardiovascular disease, automated diagnosis, cardiac imaging, artificial intelligence, machine learning, deep learning, radiomics

## 1. INTRODUCTION

Despite significant advances in diagnosis and treatment, cardiovascular disease (CVD) remains the most common cause of morbidity and mortality worldwide, accounting for approximately one third of annual deaths (1, 2). Early and accurate diagnosis is key to improving CVD outcomes. Cardiovascular imaging has a pivotal role in diagnostic decision making. Current image analysis techniques are mostly reliant on qualitative visual assessment of images and crude quantitative measures of cardiac structure and function. In order to optimize the diagnostic value 5 of cardiac imaging, there is need for more advanced image analysis techniques that allow deeper quantification of imaging phenotypes. In recent years, the development of big data and availability of high computational power have driven exponential advancement of artificial intelligence (AI) technologies in medical imaging (**Figure 1**). Machine learning (ML) approaches to image-based diagnosis rely on algorithms/models that learn from past clinical examples through identification of hidden and complex imaging patterns. Existing work already demonstrates the incremental value of image-based cardiovascular diagnosis with ML for a number of important conditions such as coronary artery disease (CAD) and heart failure (HF). The superior diagnostic performance of AI image analysis has the potential to substantially alleviate the burden of cardiovascular disease through facilitation of faster and more accurate diagnostic decision making.

In this paper we describe the main ML techniques and the procedures required to successfully design, implement, and validate new ML tools for image-based diagnosis. We also present a comprehensive review of existing literature pertaining to applications of ML for image-based diagnosis of CVD.

### 2. OVERVIEW OF PIPELINE FOR IMAGE-BASED MACHINE LEARNING DIAGNOSIS

The overall pipeline to build ML tools for image-based cardiac diagnosis is schematically described in the following section, as well as in **Figure 2**. In short, it requires (1) input imaging datasets from which suitable imaging predictors can be extracted, (2) accurate output diagnosis labels, and (3) a suitable ML technique that is typically chosen and optimized depending on the application to predict the cardiac diagnosis (output) based on the imaging predictors (input). Additional non-imaging predictors (e.g., electrocardiogram data, genetic data, sex, or age) are often integrated into the ML model and typically improve model performance.

In this section, we will first discuss the input and output variables in more detail, before introducing common used ML techniques and their applications.

## 2.1. Data, Input and Output Variables

### 2.1.1. Sources of Cardiovascular Imaging Data

Robust ML models are reliant on the availability of sufficient and accurate data. Thus, data preparation is an important pre-requisite to derive that perform well on internal and external validation. Within cardiac imaging, there is increasing availability of quality sources of organized big data through various biobanks, bioresources, and registries. Available cohorts can be classified into population-based and clinical cohorts. Population cohorts such as the UK Biobank follow the health status of a representative sample of individuals from the general population and thus are particularly useful for risk stratification. In contrast, clinical cohorts, such as the Barts BioResource or the European cardiovascular magnetic resonance (EuroCMR) registry, are composed of clinical imaging from patients and therefore more suitable for building diagnostic tools. These datasets are an invaluable resource for the development and validation of ML diagnostic models (see **Table 1** for examples of additional cardiac imaging datasets).

### 2.1.2. Input Variables

Before an ML model can be built for image-based diagnosis estimation, it is necessary to suitably define the imaging inputs. Imaging inputs may be the raw imaging data (i.e., pixel intensities), conventional cardiac indices (and other transformed quantitative image parameters) or radiomics features extracted from the image. See **Figures 3** and **4** for additional information about input variables.

### **2.1.2.1. Conventional imaging indices**

Conventional imaging indices include measures commonly used in routine clinical image analysis such as ventricular volumes in end diastole/systole and ventricular ejection fractions.

Estimation of these clinical indices requires prior contouring of the endocardial and epicardial boundaries of the relevant cardiac chambers. Deep learning approaches have been used to develop automated/semi-automated contouring tools for more efficient and reproducible segmentation of cardiac chambers.

Since manual delineation of these boundaries is tedious and subject to errors, many automatic or semi-automatic tools have been developed (see **Table 2** for examples of existing tools). Note that recently, many deep learning (DL) based approaches have been published for accurate and robust segmentation of the cardiac boundaries with promising results, however this is beyond the scope of this review [more details on this, as well as a basic introduction to ML, in cardiac magnetic resonance imaging (MRI) can be found in recent work by (3)].

**Abbreviations:** Machine learning abbreviations: AI, Articial Intelligence; AUC, Area Under Curve; ANN, Artificial Neural Networks; BN, Bayesian Network; CNN, Convolutional Neural Network; CL, Clustering; DL, Deep Learning; DT, Decision Tree; GA, Genetic Algorithm; GAN, Generative Adversarial Network; GBRT, Gradient Boosting Trees; kNN, k-Nearest Neighbors; LDA, Linear Discriminant Analysis; LR, Logistic Regression; ML, Machine Learning; PCA, Principal Component Analysis; PLS, Partial Least Squares; RF, Random Forest; ROC, Receiver Operating Characteristic Curve; Se, Sensitivity; Sp, Specificity; SVM, Support Vector Machine; VAE, Variational Autoencoder.

Cardiac imaging and clinical abbreviations: ARV, Abnormal Right Ventricle; ASD, Atrial Septal Defect; CAC, Coronary Artery Calcium; CAD, Coronary Artery Disease; CMR, Cardiac Magnetic Resonance; CT, Computed Tomography; CTA, Computed Tomography Angiography; CVD, Cardiovascular Disease; DCM, Dilated Cardiomyopathy; ECG, Electrocardiography; echo, Echocardiography; HCM, Hypertrophic Cardiomyopathy; HF, Heart Failure; HFpEF, Heart Failure with preserved Ejection Fraction; HHD, Hypertensive Heart Disease; ICA, Invasive Coronary Angiography; IR, Iterative Reconstruction; LV, Left Ventricle; MACE, Major Adverse Cardiovascular Event; MI, Myocardial Infarction; MR, Mitral Regurgitation; MRI, Magnetic Resonance Imaging; MYO, Myocarditis; NRS, Napkin Ring Sign; PET, Positron Emission Tomography; ROI, Region Of Interest; RV, Right Ventricle; SPECT, Single Positron Emission Computed Tomography.

#### TABLE 1 | Selection of cardiac imaging datasets available.


the pie chart, conventional indices are the predominant features for training ML models, followed by radiomics and deep learning techniques.

Some recent works will be listed to illustrate the use of conventional imaging indices as inputs for ML-based diagnosis models. In Khened et al. (4), an artificial neural network (ANN) was built to automatically diagnose several cardiac diseases such as hypertrophic cardiomyopathy (HCM), myocardial infarction (MI) and abnormal RV (ARV), by using as input LV and RV TABLE 2 | Selection of cardiac structural and functional analysis softwares.


ejection fraction, RV and LV volume end-systole and enddiastole, myocardial mass, as well as the patient's height and weight. In Chen et al. (5), the authors integrated a set of 32 variables from clinical data, including ejection fraction, blood pressure, sex, age, as well as other conventional risk factors, to diagnose dilated cardiomyopathy (DCM). Juarez-Orozco et al. (6) merged ejection fractions at rest and stress with a pool of clinical parameters to predict ischemia and adverse cardiovascular events using ML.

Regarding motion, strain and single intensity analysis, in Mantilla et al. (7), global spatio-temporal image features are extracted to feed a support vector machine (SVM) classifier for LV wall motion assessment. Pairwise single intensity and variance regional differences in SPECT perfusion studies mimics the clinical procedure of qualitatively comparing stress and rest images in Bagher-Ebadian et al. (8). Contractility differences and multiscale wall motion assessment are performed by means of apparent flow in Moreno et al. (9) and Zheng et al. (10) where each feature describes an oriented velocity at a given position along the cardiac ROI.

#### **2.1.2.2. Radiomics features**

Radiomics analysis is the process of converting digital images to minable data. Analysis of the data through application of various statistical and mathematical processes allows quantification of various shape and textural characteristics of the image, referred to as radiomics features (**Table 3**). Radiomics analysis quantifies more advanced and complex characteristics of the cardiac chambers than is visually perceptible. Similarly to clinical imaging indices, radiomics requires the delineation of the cardiac structures before the features can be extracted.

Introduced in 2012 (11, 12), the radiomics paradigm was, for a long time, mostly exploited in oncology (13). Recently, a number of works have shown the promise of radiomics combined with ML for image-aided diagnosis of CVD. For instance, Cetin et al. (14) demonstrated that about 10 radiomics features integrated within an ML model are sufficient to discriminate between several major CVDs. More recently, researchers at Harvard University, Neisius et al. (15) have built an ML model with 6 radiomic features calculated from T1 mapping

application, data.

#### TABLE 3 | Radiomics features overview.


sequences to differentiate between hypertensive heart disease (HHD) and HCM.

#### **2.1.2.3. Raw imaging data**

Whole raw images may also be used as the input for the ML model, without any pre-processing or calculation of hand-crafted input imaging features. About 10% of published reports rely on this type of modeling. In this case, the optimal features for predicting the cardiac diagnoses are selflearned automatically by the ML techniques based on the training sample, as opposed to a priori definition by the AI scientist.

For illustration, it is worth mentioning the work by Betancur et al. (16), an end-to-end DL model, estimating per-vessel CAD probability without any assumed subdivision of the input coronary territories from imaging data. The authors in Wolterink et al. (17) built a coronary artery calcification (CAC) detector, also based on DL trained on raw CT images. A similar DL model directly built from raw echo images was demonstrated in Lu et al. (18) to identify dilated cardiomyopathy cases. Also from raw echo images, the authors in Kusunose et al. (19) built a DL model for automatic detection of regional wall motion abnormalities.

#### 2.1.3. Output

ML algorithms may be developed using supervised or unsupervised learning methods. Supervised learning requires accurately labeled training examples. In the simplest form, the output is a binary variable which takes a value of 1 for a diseased individual and 0 for a control healthy subject. To obtain a robust ML model, it is recommended to use a balanced training sample, comprising a similar number of healthy and diseased subjects. Note that the binary classification can be easily extended to the multi-class case if several diseases or stages of disease are to be included in the ML model. Thus, supervised learning algorithms link the input variables to labeled outputs. Unsupervised learning is the training of algorithms without definition of the output. Through this technique, the ML algorithm groups the sample through recognition of inherent patterns within the data. In general, supervised learning outperforms unsupervised learning and so is the preferred method in situations where the ground truth is known. However, unsupervised learning has unique value for discovery of novel disease sub-types and patient stratification e.g., different pheno-groups of hypertensive heart disease or CAD.

### 2.2. Machine Learning Techniques

ML, refers to the use of computer algorithms that have the capacity to learn to perform given tasks from example data without the need for explicitly programmed instructions, i.e., image-based cardiac diagnosis in our case. This field of AI uses advanced statistical techniques to extract predictive or discriminatory patterns from the training data in order to perform the most accurate predictions on new data. We present the most commonly used ML techniques in the field of cardiac imaging and diagnosis for a non-expert audience and discuss their benefits and drawbacks (see **Table 4** and **Figure 5** for additional information). A list of diagnostic applications for each method will be provided as examples.

#### 2.2.1. Logistic Regression

A Logistic Regression (LR) model is used to estimate the probability of a given output based on input variables in a continuous fashion, in contrast with a binary classifier. Final probabilities add up to one, so one obtains a stratification into all possible outcomes and the odds for each one. One property of this model is that a slight change in the input value may disproportionately impact the final probability prediction, as can be seen in **Figure 6A**. Additionally, the input vector dimension (number of predictor variables) must be kept low, as this can lead to costly model training processes and risks overfitting of the model to the training dataset with resultant poor generalisability of the model. Thus, when dealing with a large number of input variables, dimensionality reduction algorithms, such as principal component analysis (PCA) or linear discriminant analysis (LDA), are applied to reduce the number of predictors to those that are most informative. LR is a valuable model to be selected when different sources of data must be integrated in a binary classification task and low complexity is required.

In the literature, several works have applied LRs for their particular application. For example, Zheng et al. (10) applied a sequence of four LRs to classify patients according to cardiac pathologies by using shape features extracted from cine MRI



per segment. Thus they obtained a simple and easily interpretable model with only three input features per classifier. In another example, Arsanjani et al. (20) used a combination of classifiers improved with a LR to diagnose obstructive CAD using SPECT images. Finally, a LR was also applied by Baeßler et al. (21) to diagnose acute or chronic heart failure-like myocarditis.

#### 2.2.2. Support Vector Machine (SVM)

Support vector machines (SVMs) are supervised ML models whereby the optimal linear or non-linear boundary segregating the data into two or more classes is identified, as can be seen in **Figure 6B**. Prior to application of SVMs, the function which will be used for segregating the data should be selected, the so called kernel function. The most used kernels are the linear function or the Gaussian function. The remaining parameters of the SVM model are chosen empirically by training a set of models and keeping the settings as for the model with the lowest error. Since this model is insensitive to non-discriminating dimensions, a dimension reduction could be applied to the input variables to ease the training and obtain a better generalization as for linear regression. One major drawback of SVM is that it becomes memory expensive when large amounts of data are processed. SVM is a good choice to identify non-linearity and sparsity in the input data : different kernels can be used to fit different distributions.

Amongst all ML methods presented in this review, SVM is one of the most frequently used techniques and some works find this model to obtain the best performance. For example, Conforti and Guido (22) presented a comparison of SVM models with different kernels (polynomial, Gaussian and Laplacian functions), the original 105 features and a feature selection of 25 as input for the early diagnosis of myocardial infarction. Similarly, Arsanjani et al. (23) and Ciecholewski (24) found that a SVM model outperformed previous algorithms used in the task of CAD identification by using data extracted from SPECT images. In the first example, a second degree polynomial was used as kernel while in the second, a Gaussian function showed better performance. A SVM was also the best model when predicting acute coronary syndrome for 228 patients using histological, ECG and echo qualitative features, as shown by Berikol et al. (25). As a final example, Borkar and Annadate (26) obtained a very good accuracy for discrimination of DCM and atrial septal defect (ASD) patients using radiomics features and a SVM using a Gaussian kernel function.

#### 2.2.3. Random Forest (RF)

This popular technique consists of a combination of decision trees (DTs) trained on different random samples of the training set, as can be seen in **Figure 6C**. Each DT is a set of rules based on the input features values optimized for accurately classifying all elements of the training set. DTs are nonlinear models and tend to have high variance. If the DT is grown very deep it can pick up irregularities in the training dataset and consequently problems with overfitting may be encountered. This problem is counteracted in a RF through training on different samples of the training dataset. In this way the variance is reduced as the number of DT used, lowering therefore the generalization error and becoming a powerful technique. The final prediction is obtained by selecting the mode (for classification problems) or the mean (for regression problems) of all predictions. Two parameters must be selected for these models: the number of DTs and the depth level for each DT (i.e., the number of decisions). However, one must bear in mind that whilst discriminatory power on training dataset is increased as DT increase in depth, this is often at the expense of losing generalization power. RFs are chosen in order to transform the problem into a set of hierarchical queries represented as DTs. However, RFs are not very resistant to noise.

In the literature, RF or DT have been used frequently and were selected as the best performing model in some works. For

example, Moreno et al. (9) compared SVM and RF models in binary classification tasks with 2,964 input features for different cardiac pathologies, such as HF or HCM, using optical flow features in cardiac MRI, where the latter model obtained the best performance in most cases. In this case, each DT in the RF model had two depth levels for fast predictions in clinical practice. In another example, Wong et al. (27) a RF outperformed a SVM for infarction detection by means of regional intensity analysis and motion modeling. As a final example, a RF was also used by Baeßler et al. (28) to find the most discriminative features in texture analysis for T1-weighted cardiac MRI for HCM and normal patients classification.

### 2.2.4. Cluster Analysis

Cluster analysis relates to the set of techniques that group together subjects in the form of data points according to similarity or proximity in the parametric space given by quantitative data extracted from input variables (image parameters and/or clinical information), as can be seen in **Figure 6D**. This technique is very useful for patient stratification, since patients with apparently similar pathology, according to existing image analysis techniques, may fall into previously unrecognized subsets which may inform understanding of disease pathophysiology and inform more effective targeted therapies. Some clustering techniques require definition of outcomes, which means that lay on the unsupervised learning ML group. However, in classification tasks a very common supervised clustering strategy is k-nearest neighbors (kNN) clustering, where k is the number of neighbor subjects to look at when finding subgroups. In this case, surrounding diagnosed subjects will determine the outcome for a new patient. Most of the reviewed literature in clustering uses kNN (29, 30).

Additional studies report the use of different cluster analysis for classification and/or discovery of cardiac pheno-groups. For example, Bruse et al. (31) used hierarchical clustering techniques to subdivide 60 patients into three groups, a healthy cohort and two associated with congenital heart disease by using shape features from cardiac MRI. Wojnarski et al. (32) also used a cluster analysis technique to group bicuspid aortic valve patients using CT data to find three phenotypes, and a RF was applied later to identify biomarker differences for these phenotypes using echo and clinical data.

### 2.2.5. Artificial Neural Network (ANN)

ANNs are motivated by the structure and interactions of biological neural networks. These models propagate input data in a hierarchical fashion through internal nodes in different layers. Each input line has a corresponding weight that must be estimated and iteratively adjusted during the training process. The ANN adapts until the weights giving optimal model performance are identified (**Figure 6E**). A nonlinear function is applied in each node to the contribution from incoming connections for obtaining its value/activation (net input function). Weight optimization provides the model with great adaptability to complex boundaries separating classes because of the high non-linear combinations of features involved in such models. Moreover, the connections between layers in an ANN can be used to design different networks depending on the application. Some caveats are the lack of an underlying theory for deciding the amount of layers or nodes in each layer, that depends on each problem and the amount of training data, as well as the trend for these models to adapt to the training set due to the large difference between number of parameters/weights of the model and training samples. ANNs are the best choice when large amount of data is available.

In the literature, these techniques have been applied frequently. For example, Tsai et al. (33) used ANNs for detection of HCM and DCM patients using features extracted from echo. And more recently, two works by Nakajima et al. (34, 35), with the same SPECT dataset with 1,001 cases, used ANNs to assess CAD using features extracted from stress and rest images with good accuracy.

### 2.2.6. Convolutional Neural Network (CNN)

CNNs are an extension of ANNs in which the value of a node in a given layer is affected by the spatial surrounding

biomarkers within the intermediate layers.

of a node in the previous layer through an operation called convolutional product. These models are specially designed for image processing, where spatial information for the nodes (pixels) is essential for the final prediction. The advantages and disadvantages are shared with ANNs. The main difference that make these models very popular nowadays is that images are provided as input without any feature extraction. These models are able to extract their own meaningful features for the final prediction, as illustrated in **Figure 6F**. Additional models exist for compressing images to a lower dimensional representation space such as the Variational Autoencoder (VAE) and Generative Adversarial Networks (GANs) where additional analysis can be carried out more easily (e.g., clustering or classification with a SVM model).



A balanced approach should be taken to defining the layers of a CNN; whilst a deeper network loses information from the original image with each new layer, a network with few layers could have problems extracting meaningful features for the final prediction. CNNs are widely used for analysis of images and their application to cardiac imaging is reported in numerous studies. Wolterink et al. (17) presented a framework where two cascading CNNs were able to detect CAC using cardiac computed tomography angiography (CTA) images. Their models had 8–13 convolutional layers that reduced 200 × 200 features (pixel intensities) to only 32. Zhang et al. (36) used a 13-layer CNN to diagnose HCM, cardiac amyloidosis and pulmonary artery hypertension from echo images of size 224x224, that were reduced to 4,096 features. Madani et al. (37) used a CNN model to predict left ventricular hypertrophy from echo images of size 120 × 160.

#### 2.2.7. Additional Steps

### **2.2.7.1. Normalization**

Due to the diverse nature of different information sources in cardiac medicine, a normalization step is often required prior to model crafting. In general, learning algorithms benefit from standardization of the data set, e.g., some algorithms as SVM will improve cardiovascular predictions if all numerical features are zero centered and have a variance of the same magnitude order. Furthermore, some non-linear transformations can prepare the selected features to create a model more robust to outliers. Some of the most common techniques are mentioned in **Table 5**.

For illustration, Wong et al. (27) shows that feature normalization has a positive impact in the ML model performance. Moreover, categorical variables should be encoded using Integer encoding, that consist in referencing each possible categorical value with an integer, or One-Hot encoding, that considers each possible categorical value as a new binary variable.

#### **2.2.7.2. Dimensionality reduction and feature selection**

Frequently, after extracting features from different sources such as demographic and clinical data, conventional indices and imaging parameters, one ends up with thousands of values defining a single patient. This information is later utilized during the training process of ML models, but the combination of a large number of input parameters with a limited number of samples (as usually happens in the medical field) can make the optimization problem expensive and may limit the generalization ability of our model. Thus, a dimensionality reduction algorithm is usually applied to the input data, such as principal component analysis (PCA) or linear discriminant analysis (LDA). Another proactive approach is feature selection. Such method will add sequentially the most discriminative features for the particular model instance being trained and dismiss redundant and noninformative ones.

For example, Tabassian et al. (29) aimed to analyze deformation curves of the LV in echocardiographic records of 120 patients. The strain curves obtained were reduced by means of PCA and the result was used to train a strain kNN model. The resultant accuracy was 0.87, significantly higher than the clinician's results, 0.7. Cetin et al. (38) identified HHD from healthy controls in 200 subjects with SVM and sequential forward feature selection. The predictive power of selected radiomics (AUC = 0.76) was substantially improved compared to conventional indices (AUC = 0.62).

#### 2.2.8. Validation

In order to prove the validity of ML applied to cardiac imaging, results must be analyzed from two perspectives: statistical validity, considering the reproducibility with different cohorts and correctness of statistical values obtained (i.e., metrics), and intra-validity, regarding the clinical and real implications of the algorithms on a daily basis (i.e., clinical effectiveness). This is a pairwise co-existence; none of the ML cardiac imaging algorithms will be applied in clinical routine if there is no agreement from both sides. The following sub-sections will describe how the metrics and the clinical effectiveness are considered.

A cohort is sorted in a very specific manner for ML purposes. For the validity of the algorithms, a whole cardiac imaging data set should be split into 3 different subgroups, called training set, validation set, and testing set, respectively. These groups are often selected in such way that subgroups share demographic distributions such as age or sex, in order to represent a real world scenario. Of course, a balanced distribution of control and pathologic subjects is also required. Once the ML model is trained and tested, different metrics are obtained to evaluate its performance.

Accuracy measures the percentage of the algorithm classifying the input data correctly. It is a simple measure used in multiple scientific scenarios if there is no class imbalance (i.e., one class represented by a higher number of individuals compared with the rest). One of the drawbacks of using accuracy as the metric is that there is a knowledge loss when measuring False Positive and False Negative observations. Therefore, Specificity (Sp) and Sensitivity (Se) are widely used for measuring the performance of the algorithm, this time taking into consideration a possible class imbalance. In order to assess the performance of an algorithm and to understand where there might be a miss-classification issue, a table report called Confusion Matrix is used. This specific table layout is typically used to describe the performance of a supervised learning model. Each row of the matrix represents the instances in a predicted class while each column represents the instances in an actual class (or vice versa). This way, a computer scientist can have a wider

overview of the parameters that may be changed or which classes are down-performing the algorithm. From sensitivity, specificity and the confusion matrix we can extract a performance plot representation called the receiver operating (ROC) curve. It is created by plotting the true positive rate (TP rate) against the false positive rate (FP rate) at various threshold settings. In ML, the true-positive rate is also known as sensitivity, recall or probability of detection. ROC analysis is related in a direct and natural way to cost/benefit analysis of diagnostic decision making. The area under the ROC curve (AUC) is another metric used to measure algorithms' performance.

It is noticeable that AUC can be derived from decision boundaries obtained by ML models despite the fact that it is trained with discrete outputs. When a trained model is asked to make a prediction, a probability can be computed and used to generate a ROC analysis.

### 3. DIAGNOSTIC APPLICATIONS—A REVIEW OF LITERATURE

We conducted an organized, pre-defined literature search of two electronic databases (Google Scholar, Scopus). We included studies using a well-defined ML technique for cardiac image analysis using echocardiography, cardiac magnetic resonance, cardiac computed tomography, or single photon emission computed tomography (SPECT). Our search strategy comprised a series of title and whole text searches with search terms combined using Boolean operators. Search results were filtered by subject area, limiting to entries from Cardiology, Computer Science and Engineering fields. We review in detail various achievements in the diagnosis of a wide range of cardiac diseases using image-based ML methods. Statistics about the conducted literature review can be seen in **Figure 7**.

### 3.1. Myocardial Infarction

Accurate and timely identification of MI helps in guidance of treatment strategies and reduction in the time taken for further tests. While MI diagnostic assessment using imaging is prone to inter- and intra-observer variability and requires significant amount of time of experts, ML methods offer opportunities to simplify, speed up and quantify the diagnostic process in combination with conventional assessment. For example, Nakada et al. (39) demonstrated that MI diagnosis can be achieved in echo using quantitative motion features, avoiding the interobserver human variability, as input for an ANN reaching an accuracy of 0.95. Later, Ungru et al. (40) validated these results in mice models by inducing MI in healthy specimens with a prediction accuracy of 0.91, comparing several ML techniques. The same level of accuracy was obtained in the first texture analysis work, by Agani et al. (41), with only 17 subjects and a clustering approach. This echocardiographic research was later extended with a full pool of texture features and 160 subjects by Sudarshan et al. (42). In this work, DT, ANN and SVM models were benchmarked, with the best accuracy obtained using ANN: 0.94 (Se = 0.91, Sp = 0.97). Vidya et al. (43) also performed an intensive texture analysis for 800 subjects, achieving an accuracy of 0.99 using a SVM. In their study, different pre-processing techniques were used to enhance the cardiac images.

Cardiac MRI has particular value in identification of MI. Since 2017, 13 studies were found integrating input variables from this imaging modality. Baeßler et al. (44) used late gadolinium enhancement MRI as a standard reference for non-enhanced MRI discrimination between chronic and subacute MI. Radiomic features in combination with a LR gave an AUC of 0.92 in a cohort of 180 patients. Similarly, segment viability can be detected on cine MRI using also radiomics, as suggested by Larroza et al. (45). This classification between viable, nonviable and remote segments yielded an AUC of 0.84. However, we believe that these encouraging results should be validated with a bigger cohort, and a well-balanced segment viability distribution. Recently, Zhang et al. (46) tried to detect MI from non-enhanced MRI images. 212 patients with chronic MI and 87 healthy control patients were used to train a three-stage DL pipeline. The per-segment AUC for detecting chronic MI was 0.94 (Sp = 0.99, Se = 0.9)

Two consecutive state-of-the-art texture analysis studies were conducted in cardiac CT: Mannil et al. (47) and Mannil et al.


TABLE 6 | Selected studies using image-based ML analysis for the diagnosis of Myocardial Infarction.

(48). The former underlines ML ability for detecting MI on noncontrast low radiation dose CT images on the basis of features invisible to the radiologists' eye, obtaining an AUC of 0.78. The latter study evaluates the impact of automatic classification methods using different iterative reconstruction (IR) strengths for contrast-enhancement images, reporting an accuracy of 0.94 (IR 3) and 0.97 (IR 5) for the ML model, while three independent readers achieved 0.73 (IR 5) on average. A summary of MI studies can be found in **Table 6**.

### 3.2. Cardiomyopathies

Cardiomyopathy is a broad term describing various heart muscle disorders, a first level of subclassification is into ischaemic and non-ischaemic cardiomyopathies. This heterogenous group of disorders have many causes, signs and symptoms, and require different treatments. The challenge of distinguishing different cardiomyopathies is illustrated by the fact that many of them can be associated with diverse manifestations. Each disease entity is associated with a typical imaging phenotypes. Whilst in routine image analysis, it is not always possible to discriminate individual cardiomyopathies, this may be improved with the more granular and quantitative approach to image analysis in ML models. These premises makes ML-based imaging diagnosis a perfect tool for computer aided analysis of heterogeneous cardiomyopathies. For example, Gopalakrishnan et al. (49) used a set of conventional indices from a pediatric cardiac MRI cohort of 83 subjects to characterize five different cardiomyopathies. In this study, a DT (AUC = 0.79) was compared with other ML methods (AUC = 0.73–0.77). Physiological vs. pathological patterns of HCM remodeling were characterized by Narula et al. (50) using an ensemble of models with conventional indices from 2D echo as input (Se = 0.96, Sp = 0.77).

In 2017, a relevant challenge was organized by Bernard et al. (51). The Automated Cardiac Diagnosis Challenge (ACDC) aimed to evaluate the performance of different automatic methods for the classification of 150 subjects into 5 categories (healthy, HCM, DCM, ARV and MI) as provided by clinical experts. Several approaches were proposed for this problem. Khened et al. (4) and Wolterink et al. (52) used a set of conventional indices extracted from their own automatic delineations as input for a RF to obtain an accuracy of 0.96 and 0.86 on the test set, respectively. Isensee et al. (53) also used a RF and their own segmentation scheme to classify cardiac cycle dynamic features, with an accuracy of 0.92. From this study, the benefit of the addition of temporal analysis is remarkable and provides a strong argument to be exploited further in future cine MRI studies. Cetin et al. (14) used SVM to classify a complete pool of radiomic features from manual segmentation, obtaining also an accuracy of 0.92. Additional research has been done later using the same dataset. Snaauw et al. (54) proposed a novel approach, using CNN bottleneck representations to discriminate between the five categories, obtaining a modest accuracy of 0.78. Another interesting approach was taken by Biffi et al. (55). Their VAE architecture was trained with two multi-center cohorts of 537 and 200 patients and tested on their own dataset and on the ACDC dataset, obtaining an accuracy of 1.0 and 0.9, respectively.

Later, Puyol-Antón et al. (56) combined MRI and echo data and per-segment motion analysis to diagnose DCM by means of LDA, achieving an accuracy of 0.94 (Sp = 0.96, Se = 0.93). Recently, Neisius et al. presented two complementary works approaching HCM and HHD diagnosis from two different perspectives, Neisius et al. (15, 57). In the first work, a complete strain analysis and a LR achieved an accuracy of 0.67 (Sp = 0.64, Se = 0.68). The second one applied an exhaustive texture analysis for T1 mapping. A selection of 6 radiomic texture features and a linear SVM model showed an improved accuracy of 0.86 (Sp = 0.91, Se = 0.77). A summary of cardiomyopathy studies can be found in **Table 7**.


### 3.3. Coronary Artery Disease

Non-invasive imaging assessment for detection of CAD has a great potential impact on clinical practice. If ischemia can be discarded with a high probability, invasive coronary angiography (ICA) may be avoided. Advanced ML image analysis techniques can improve the diagnostic accuracy of myocardial ischemia and through this improve CAD management and reduce unnecessary downstream testing.

A very first approach dating from 1999 showed promising results. Considering ICA as reference standard, Kukar et al. (58) used scintigraphy, ECG and data on symptoms from 327 patients to detect CAD. Different ML models and feature selections were tested and in some cases the ML model outperformed clinicians in accuracy (0.92 vs. 0.91, respectively), but not in sensitivity. An exhaustive approach by Kurgan et al. (59) sets the base for a semi-automated diagnosis pipeline in perfusion SPECT. In their work, a pseudo-DT was crafted from intensitybased features, for 267 subjects, achieving an overall accuracy of 0.8. Another similar work was conducted in perfusion SPECT (n = 115) and Equilibrium Radionuclide Angiocardiography (n = 58) by Bagher-Ebadian et al. (8). Using ICA as ground truth for both studies, CAD was assessed using mean and variance intensity features extracted from stress and rest studies in anterior, left anterior oblique and left lateral projections, obtaining accuracies of 0.77 and 0.73 with an ANN. A similar methodology was covered in detail by Guner et al. (60). A cohort of 308 patients with clinical coronary CTA assessment was utilized to train an ensemble of ANNs for CAD discrimination. A combination of demographic information and frequency, phase and brightness features provided as input variables resulted in model accuracy of 0.74, outperforming some of the non-expert clinicians. The results revealed that single-vessel CAD was more difficult to identify. Recently, complementary work by Shibutani et al. (61), including per-segment analysis, was performed on 21 patients who underwent perfusion SPECT. A total of 109 abnormal regions were examined and an ANN achieved better results than two independent observers for stress defect and ischemia detection, with respect to ICA as gold standard.

Alternatively, resting CT can be used for CAD diagnosis without additional contrast injection for stress imaging. Han et al. (62) used 3 quantitative features and the 17-segment model to obtain 51 input variables for training a gradient boosting algorithm, a ML technique that builds an ensemble of classifiers to improve the final accuracy. Invasive angiography and FFR were used as gold standard. This study based on a 252 patients' cohort from 5 countries and 17 centers, obtained an AUC of 0.75. Another state-of-the-art approach using cardiac CT, by Coenen et al. (63), showed that improved reclassification of nonsignificant stenosis is possible with ML-based image analysis. Three hundred and fifty-one patients, including 525 vessels with invasive FFR comparison were included in this study. A set of 28 anatomical features were computed from semi-automatic 3D CT reconstructions. On a per-vessel basis, diagnostic accuracy improved from 0.58 (CTA) to 0.78 (ML model). The per-patient accuracy improved from 0.71 to 0.85. A summary of CAD studies can be found in **Table 8**.

### 3.4. Atherosclerosis

Atherosclerosis is a strong and independent predictor of cardiovascular events. Plaque is often scored manually by experts, which leads to an increase in workload, is prone to


TABLE 8 | Selected studies using image-based ML analysis for diagnosis of coronary artery disease.

false positives and to inter-observer variability regarding CAC detection. Hence, the ability to quickly and reliably quantify calcification using ML models provides additive value to clinical risk scoring tools and will enable superior prognostication of individuals. To overcome these issues and bring robustness to such procedures, intensive cardiac imaging feature extraction may be utilized.

Išgum et al. (30) designed an automated method for detection of aortic calcification, an indicator of established atherosclerotic disease, based on shape and intensity features. Forty abdominal scans contained a total of 249 CAC determined by a human observer. The method detected 209 CAC (Se = 0.84) at the expense of 1.0 false-positive object per scan on average, while the presence of contrast increased the number of incorrect classifications. This work was complemented by Išgum et al. (64), analysing cardiac CT with a more sophisticated feature set to obtain a final accuracy of 0.74 for CAC detection. Feature selection showed that no shape features were included in the classification stage, highlighting the discriminating power of texture analysis in CT.

Wolterink et al. (65) used cardiac CT scans thresholded at 130 Hounsfield units and a connected-component analysis to obtain candidate regions in the coronary arteries for 164 subjects with expert annotations. Their texture analysis was similar to Išgum et al. (64), and the resulting accuracy with DTs was 0.86 for risk stratification. This work also introduced a guided review where the most uncertain CAC were manually inspected again, increasing the overall accuracy up to 0.92. Later, a large radiomic pool of 4,440 features was extracted from a group of 60 subjects with Napkin Ring Sign (NRS) and non-NRS plaques with similar degree of manually segmented CAC by Kolossváry et al. (66). This research unveils the value of radiomics to find discriminative features: almost half of them reached an AUC of 0.8, short- and long-run low gray-level emphasis and surface ratio of high attenuation voxels had the highest AUC values (0.92 and 0.89, respectively). Finally, in a recent work, Zreik et al. (67) used recurrent CNNs in multiplanar reformatted coronary CTA images previously annotated by an expert, achieving accuracies of 0.77 and 0.8 for plaque and stenosis characterization, respectively. A summary of ATH studies can be found in **Table 9**.

### 3.5. Valvular Heart Disease

Heart valve disease is an increasingly common pathology of the cardiovascular system and an increasing number of patients are expected to require heart valve replacement. Such diverse group of disorders can benefit from cardiac imaging ML integration through early diagnosis, treatment or surgery planning. For instance, Elalfi et al. (68) used imaging preprocessing techniques (Gaussian and Gabor filtering) and intensity and texture features to generate an ANN model with 120 echo images. These images were organized in 8 types of valvular diseases. The obtained accuracy was high at 0.93. This is encouraging particularly considering the diversity of outcomes.

A similar approach was addressed for mitral regurgitation (MR) severity estimation using echo videos. Moghaddasi et al. (69) took advantage of binary patters as image descriptors which include details from different viewpoints of the heart. kNN and SVM models were trained with 102 patients divided in four groups: mild MR (n = 34), moderate MR (n = 32), severe MR (n = 36), and control (n = 37). SVM obtained the best accuracy, 0.99. Another interesting work mentioned in previous sections was conducted by Wojnarski et al. (32). A summary of HVD studies can be found in **Table 10**.

### 3.6. Heart Failure

Heart failure with preserved ejection fraction (HFpEF) is a heterogeneous group of disorders with variable treatment response and poor outcomes. There has been increasing interest in improved phenotyping of HFpEF to aid understanding of underlying disease mechanisms and also to guide treatments toward subtypes who may derive benefit. Given

TABLE 9 | Selected studies using image-based ML analysis for diagnosis of aortic and coronary atherosclerosis.


TABLE 10 | Selected studies using image-based ML analysis for diagnosis of valvular heart disease.


TABLE 11 | Selected studies using image-based ML analysis for diagnosis of heart failure.


the heterogeneous nature of HFpEF, ML techniques are a very suitable tool for diagnosis and image phenotype stratification. Some of the reviewed studies in previous sections were also related to the characterization of heart failure (9, 70). Additional work in this field was presented by Shah et al. (71), that prospectively studied 397 HFpEF patients and performed detailed clinical, laboratory, electrocardiographic and echocardiographic phenotyping of the study participants. Clustering techniques were applied to divide the cohort into 3 pheno-groups. Phenomapping was helpful for improved classification and categorization of HFpEF patients and risk stratification by means of SVM, obtaining an AUC of 0.76. ML applied to HF phenogrouping is also used for prognostic tasks by Cikes et al. (72). A summary of HF studies can be found in **Table 11**.

### 3.7. Abnormal Wall Motion

Most of the existing quantitative techniques for wall motion characterization involve laborious post-processing and image analysis. For this reason, ML approaches with a minimum user input and a correlation with the segmental cardiac function can improve clinical routine and triage.

For instance, Mantilla et al. (7) detected wall motion abnormalities in the left ventricle by means of spatiotemporal profiles obtained with pseudo delineations of 20 MRI patients. Wavelet and Fourier transforms were applied and the subsequent spaces were used to generate two models: SVM and dictionary learning (DICTL). Dictionary Learning at mid-cavity level obtained the best accuracy, 0.96 (Sp = Se = 0.96). Afshin et al. (73) exploited intensity distributions per segment. In their work, a reference frame automatically propagated to each cardiac phase generated the 16 segments for the whole cardiac cycle. LDA reduced feature dimensionality and linear SVM obtained an accuracy of 0.86 in a cohort of 58 MRI subjects.

Kusunose et al. (19) used a total of 300 patients with a history of myocardial infarction and 100 age-matched control patients. Each case contained echo from short-axis views at end-diastolic, mid-systolic, and end-systolic phases. An ensemble of 10 CNN models were trained. AUC obtained by the ML ensemble was similar to that produced by the cardiologists and sonographer readers (0.99 vs. 0.98, respectively), and the same occurred for territory detection (0.97 vs. 0.95, respectively). A summary of AWM studies can be found in **Table 12**.

### 4. DISCUSSION AND FUTURE PERSPECTIVES

Reflected by the large amount of already published data reviewed above, AI in general and ML in particular have been shown to exhibit a huge potential to significantly influence diagnostic decision making in cardiology. In contrast to "traditional" statistical methods, the techniques from the field of AI are able to deal with large amounts of data ("big data") and to


TABLE 12 | Selected studies using image-based ML analysis for diagnosis of wall motion abnormalities.

integrate information from all fields of clinical care, including e.g., clinical parameters ("clinomics"), genetic information ("genomics"), protein metabolism ("proteomics"), and imaging data ("radiomics") within one large all-encompassing analysis framework. The steadily increasing computational power and the increasing availability of data through mobile applications and the digital transformation of the global healthcare systems further contribute to the advancement of the field. Consequently, future studies will continue the use of these techniques in order to allow translation into routine clinical practice and thus pave the way toward improved diagnostic decision making tailored to individual patient-specific needs (subsumed under the heading "precision medicine").

Yet, in today's clinical routine, diagnostic decisions are still drawn from stand-alone parameters [e.g., LV ejection fraction, (74)], despite many encouraging research studies from the field of AI. On a per-patient basis, the diagnostic and prognostic value of such independent functional parameters was found to be low, Park and Kim (75). Given the diversity of cardiovascular imaging modalities, their potential additive value for more accurate diagnostics and risk stratification remains unclear. Besides, continued reliance on subjective visual interpretation, has resulted in considerable observerdependencies and lack of standardization. The application of AI and precision medicine to CVD, however, is currently still is in its infancy, and faces huge challenges which have to be overcome by future research. To establish novel imaging biomarkers and AI techniques, the robustness and reproducibility of quantitative imaging features must be ensured, Zwanenburg et al. (76). Up to now, trained models and algorithms have limited generalizability due to the multiplicity of potential influencing factors (including differing scanners, vendors, CT radiation doses, MRI field strengths, sequences, sequence parameters, spatial and temporal resolutions, reconstruction algorithms, reconstruction parameters, and so forth; **Figure 8**).

For CT and positron emission tomography (PET) imaging, a variety of studies have highlighted difficulties in producing reliably reproducible radiomic features when using different vendors, scanners, and acquisition or reconstruction settings (48, 77–84). While the "image biomarker standardization initiative" (IBSI) has established certain standards for radiomic studies, Zwanenburg et al. (76), the specific needs of cardiac imaging have not yet been met. For cardiac CT, Hinzpeter et al. and Mannil et al. have investigated the influence of slice thickness, Hinzpeter et al. (84), and iterative reconstruction algorithms, Mannil et al. (48), on the robustness and comparability of radiomics features – observing considerable feature variations for differing technical settings. In contrast to this evolving body of literature on CT imaging, little evidence exists concerning the robustness of radiomic features in MRI (75, 85–87). Given the qualitative nature of most MRI sequences and the absence of absolute signal intensities (in contrast to CT imaging for instance), the robustness of radiomic features seems to heavily depend on acquisition sequences as well as acquisition and reconstruction parameters. In a recent phantom study, Baeßler et al. sought to evaluate the influence of different acquisition sequences, spatial resolution, and postprocessing settings (88) revealing that the robustness of radiomic features was heavily influenced by the acquisition sequence and image resolution as well as image processing settings. Future work not only needs to add to the understanding of such influencing factors but should also merge into extensive standardization efforts to ensure reliability of all imaging measures.

Several attempts to improve radiomic feature robustness through image normalization have been made. For more reliable quantification of emphysema, normalization was proposed for chest CT images reconstructed with different kernels, Gallardo-Estrella et al. (89). The proposed method decomposed each scan into multiple frequency bands, the energy of which was then normalized to the average energies observed in a set of scans reconstructed with a reference kernel. Building on these results, Jin et al. used a deep learning-based strategy for CT image normalization by means of a U-Net, Jin et al. (90). For harmonization of MRI images, similar deep learning algorithms were proposed for dynamic contrast enhanced (DCE) images in breast, Samala et al. (91), and brain MRI, Dewey et al. (92). Although yielding promising results, the applicability of such approaches in cardiovascular applications remains elusive, which is due to inherent particularities of cardiac imaging. Other than breast and brain, the human heart is steadily moving because of breathing and myocardial contraction. Second, the contrast bolus inside the ventricular lumen may influence the myocardial features. Aside from these specific characteristics, the impact of image normalization on extracted radiomic features has not been fully investigated yet. Besides lack of standardization of technical factors, the recent trend to train ML classifiers on relatively small datasets is a major issue of current methodology and hampers translation of the novel techniques into routine clinical practice. The small sample sizes in most cardiovascular imaging studies (usually N < 100 with > 1,000 variables in the models) lead to a considerable risk of overfitting. Overfitting leads to poor generalisability of the classification models when deployed to different datasets. Besides the current lack of imaging feature standardization and the problem of model-overfitting, other challenges should be acknowledged when it comes to translation of AI to daily patient care. While big data aims to integrate data from various sources, the current lack of interoperability of many systems used in clinical care poses huge obstacles for data pooling approaches. Several national and international attempts are currently under way to solve interoperability issues for medical care and to allow a seamless integration of different databases and informatic systems used in healthcare.

The ability to understand the rationale behind ML generated diagnostic grouping may be crucial in order to achieve widespread clinical use of this novel technology. However, especially with DL techniques, those are usually considered as being "black boxes," which do not deliver any insights or explanations on how they reached their conclusions and upon which, e.g., imaging features, they based their decision. Although several attempts and ongoing research exist on delivering insights into an algorithm's decision making (such as heatmaps), these attempts are not sufficiently elaborated so far to convince most cardiology practitioners to use a diagnostic black box in daily clinical patient management. Thus, interpretability of DL models including the psychological aspects of digital transformation itself should represent one major aim of future research. Radiomics might represent a valid alternative for the meantime, since radiomic models in cases where an appropriate and stepwise feature reduction is performed before training the ML algorithm—deliver more insights into the specific imaging features which were important for the model's classification performance. In summary, solutions achieving better standardization or normalization resulting in better generalisability are an important condition to bring radiomics and AI into cardiac precision medicine with concomitant improved diagnostic approaches to CVDs. In addition, better interoperability of healthcare informatics systems should be achieved. Finally, the steadfast progression of AI approaches to clinical decision making represent an abrupt change from conventional medical reasoning, as such, it is essential to engage with the psychological impact of the ongoing digital transformation in order to facilitate the transition of medical practice in line with advancing technologies. The extensive and encouraging work reviewed in this article above pursues one common goal for the future of cardiovascular medicine: to pave the way toward better diagnosis and precision medicine in cardiology. The application of AI to cardiology holds the promise to revolutionize individual disease monitoring and treatment (93), thus overcoming the currently used "one size fits all" approach derived from large clinical studies.

### AUTHOR CONTRIBUTIONS

CM-I performed the literature search and its categorization, developed statistics, wrote the diagnostics literature section, and coordinated the remaining sections. VC wrote the machine learning section and contributed to the remaining sections. CI wrote the data section and designed the figures. KL supervised the manuscript development, wrote the abstract and the introduction, and reviewed all sections. ZR-E, BB, and SP contributed to drafting and critical appraisal of the manuscript. All authors reviewed the manuscript.

### FUNDING

This work was partly funded by the European Union's Horizon 2020 research and innovation programme under grant agreement No 825903 (euCanSHare project). SP acts as a paid consultant to Circle Cardiovascular Imaging Inc., Calgary, Canada and Servier. SP acknowledges support from the National Institute for Health Research (NIHR) Cardiovascular Biomedical Research Centre at Barts, from the SmartHeart EPSRC programme grant (EP/P001009/1) and the London Medical Imaging and AI Centre for Value-Based Healthcare. SP and KL acknowledge support from the CAP-AI programme, London's first AI enabling programme focused on stimulating growth in the capital's AI sector. ZR-E was supported by a British Heart Foundation Clinical Research Training Fellowship (FS/17/81/33318).

### REFERENCES


angiography using paired convolutional neural networks. Med Image Anal. (2016) 34:123–36. doi: 10.1016/j.media.2016.04.004


plaque and stenosis in coronary CT angiography. IEEE Trans Med Imaging. (2018) 38:1588–1598. doi: 10.1109/TMI.2018.2883807


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Martin-Isla, Campello, Izquierdo, Raisi-Estabragh, Baeßler, Petersen and Lekadir. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# From Compressed-Sensing to Artificial Intelligence-Based Cardiac MRI Reconstruction

#### Aurélien Bustin1†, Niccolo Fuin<sup>1</sup> \* † , René M. Botnar 1,2 and Claudia Prieto1,2

<sup>1</sup> Department of Biomedical Engineering, School of Biomedical Engineering and Imaging Sciences, King's College London, London, United Kingdom, <sup>2</sup> Escuela de Ingeniería, Pontificia Universidad Católica de Chile, Santiago, Chile

#### Edited by:

Steffen Erhard Petersen, Queen Mary University of London, United Kingdom

#### Reviewed by:

Reza Nezafat, Harvard University, United States Michael Jerosch-Herold, Harvard Medical School, United States Daniel K. Sodickson, New York University, United States

> \*Correspondence: Niccolo Fuin

niccolo.fuin@kcl.ac.uk

†These authors have contributed equally to this work

#### Specialty section:

This article was submitted to Cardiovascular Imaging, a section of the journal Frontiers in Cardiovascular Medicine

> Received: 30 September 2019 Accepted: 31 January 2020 Published: 25 February 2020

#### Citation:

Bustin A, Fuin N, Botnar RM and Prieto C (2020) From Compressed-Sensing to Artificial Intelligence-Based Cardiac MRI Reconstruction. Front. Cardiovasc. Med. 7:17. doi: 10.3389/fcvm.2020.00017 Cardiac magnetic resonance (CMR) imaging is an important tool for the non-invasive assessment of cardiovascular disease. However, CMR suffers from long acquisition times due to the need of obtaining images with high temporal and spatial resolution, different contrasts, and/or whole-heart coverage. In addition, both cardiac and respiratory-induced motion of the heart during the acquisition need to be accounted for, further increasing the scan time. Several undersampling reconstruction techniques have been proposed during the last decades to speed up CMR acquisition. These techniques rely on acquiring less data than needed and estimating the non-acquired data exploiting some sort of prior information. Parallel imaging and compressed sensing undersampling reconstruction techniques have revolutionized the field, enabling 2- to 3-fold scan time accelerations to become standard in clinical practice. Recent scientific advances in CMR reconstruction hinge on the thriving field of artificial intelligence. Machine learning reconstruction approaches have been recently proposed to learn the non-linear optimization process employed in CMR reconstruction. Unlike analytical methods for which the reconstruction problem is explicitly defined into the optimization process, machine learning techniques make use of large data sets to learn the key reconstruction parameters and priors. In particular, deep learning techniques promise to use deep neural networks (DNN) to learn the reconstruction process from existing datasets in advance, providing a fast and efficient reconstruction that can be applied to all newly acquired data. However, before machine learning and DNN can realize their full potentials and enter widespread clinical routine for CMR image reconstruction, there are several technical hurdles that need to be addressed. In this article, we provide an overview of the recent developments in the area of artificial intelligence for CMR image reconstruction. The underlying assumptions of established techniques such as compressed sensing and low-rank reconstruction are briefly summarized, while a greater focus is given to recent advances in dictionary learning and deep learning based CMR reconstruction. In particular, approaches that exploit neural networks as implicit or explicit priors are discussed for 2D dynamic cardiac imaging and 3D whole-heart CMR imaging. Current limitations, challenges, and potential future directions of these techniques are also discussed.

Keywords: cardiac MRI, AI, reconstruction, dictionary learning, deep learning, undersampling

### INTRODUCTION

Magnetic resonance imaging (MRI) is a valuable tool for the non-invasive assessment of cardiovascular disease. Cardiac MR (CMR) imaging has been established as a clinically important technique for the assessment of cardiac morphology, function, perfusion, viability, and more recently quantitative myocardial tissue characterization (1–3). CMR is currently used to diagnose congenital heart disease (CHD), ischemic heart disease, valvular heart disease, pericardial lesions, cardiac tumors and cardiomyopathies, among others (4, 5). However, CMR suffers from long acquisition times due to the need of obtaining images with high temporal and spatial resolution, different contrasts, and/or whole-heart coverage. In addition, both cardiac and respiratory-induced motion of the heart during the acquisition need to be accounted for, further increasing the scan time.

Several technical advances have been proposed during the last decades to improve CMR, including the development of efficient pulse sequences to speed up the scan and improve the contrast of the images, the development of motion compensation techniques to account for the respiratory and cardiac induced movement of the heart, the use of multiple radio-frequency receiver coils for parallel imaging (PI), and the development of undersampled reconstruction techniques to acquire less data than needed (in the Nyquist sense) and thus accelerate the acquisition. PI allows to decrease the scan time by reducing the number of phase increment steps (undersampling) and exploiting the sensitivity encoding of the multiple receiver coils to recover the nonacquired data. PI has been widely integrated into commercial MR systems and is routinely used in clinical practice. Undersampled reconstruction techniques such as compressed sensing (CS) have been also employed to accelerate CMR imaging. CS works under the assumption that the k-space data is randomly undersampled, the image has a sparse representation in some pre-defined basis or dictionary, and a non-linear reconstruction is performed to enforce the sparsity of the image and consistency with the acquired MR data. In practice, CS-based reconstruction techniques employ pseudo-random trajectories (usually with variable density) along with one or several (e.g., spatial and temporal dimensions) sparse transforms such as finite differences (e.g., total variation) or wavelets operators. Early 2017, the U.S. Food and Drug administration (FDA) cleared the CS technology to enable the fast acquisition of CMR images, thus officially opening the door to the broader clinical use of this technique (6–8).

Recent efforts have been made to further improve CSbased reconstruction quality by learning dictionary-based representations of the sparse domain from the acquired data itself (or jointly during reconstruction) instead of exploiting known analytical transform domains. However, CS-based reconstruction techniques usually suffer from long computational times and their performance depends on the choice of the sparsity representation and the tuning of the corresponding reconstruction parameters. More recently deep neural networks (DNN) have been proposed to overcome these challenges by learning optimal reconstruction parameters and/or transforms from the data itself and enabling extremely fast computational times (after training), promising to further advance the field of CMR reconstruction.

In this review paper, we first briefly discuss the CS and dictionary learning models, which offer a framework for sparse signal recovery and low-dimensional signal models and serve as a background for the following section. Recent representative advances in deep learning (DL) for CMR reconstruction are next discussed, highlighting theoretical developments and cardiac applications.

### TRANSFORM AND DATA-DRIVEN CMR RECONSTRUCTION

This section briefly introduces the key concepts that underlie MR image reconstruction as an inverse problem, that will serve as background material to the rest of the review. CS-based and dictionary learning models for CMR reconstruction are also discussed. We refer the reader to Ye (9) and Jaspan et al. (10) for further discussion on the application of CS to MR image reconstruction.

### MR Reconstruction as an Inverse Problem

The general (discretized) principles of MR signal generation and image formation can be expressed as a system of linear equations (11):

$$s = E\rho \tag{1}$$

Where the MR encoding operator E includes the coil sensitivity profiles, the Fourier transform and the sampling mask, ρ is the image to be recovered and s is the acquired k-space data (**Figure 1**). The image ρ is thus reconstructed by solving an inverse problem that aims to recover an estimate of ρ from the known encoding operator E and the acquired signal s. This inverse problem is ill-posed, i.e., not all the following wellposedness conditions are satisfied: (i) existence of the solution, (ii) uniqueness of the solution, and (iii) stability of the solution (i.e., small disturbances in s do not lead to large perturbations in ρ). The main factors that make MR reconstruction an illposed problem include the large scale of the optimization, the system imperfections (e.g., coils sensitivities, signal model simplifications), the limited amount of phase increment steps (undersampling) and the acquisition noise which corrupts the signal.

To overcome the ill-posed nature of the MR image reconstruction problem, this is typically reformulated as a regularized optimization:

$$
\hat{\rho} = \operatorname{argmin}\_{\rho} \left\| E\rho - s \right\|\,\_2^2 + \lambda R\left(\rho\right) \tag{2}
$$

where the image ρˆ is recovered by balancing between a regularization term R(ρ), which is added as an additional constraint to stabilize the solution, and a data consistency kEρ − sk 2 <sup>2</sup> < ǫ, where ǫ is the noise level. The weighting parameter λ controls the degree of regularization and needs to be chosen according to the noise level of the acquired data.

Especially, considering sparsity priors and statistical properties of the MR images to regularize the reconstruction problem have shown great promise. The application of these techniques to speed CMR imaging is the topic of the following subsections.

### CS for CMR Imaging

CS MRI reconstruction assumes that the k-space data is pseudo-randomly undersampled, the image admits a sparse representation in some transform domain 8, and a non-linear reconstruction is performed to enforce data consistency and sparsity of the MR image in the transform domain. A natural approach to enforce sparsity is by replacing the regularization term in Equation (2) by the l<sup>0</sup> (pseudonorm) of the sparse coefficients (12), which counts the number of non-zero entries. However, since the l<sup>0</sup> "norm" does not satisfy the convexity property of a norm and leads to an NP-hard combinatorial problem, approximate solutions are considered instead by replacing the l<sup>0</sup> term by the convex l1-norm (13):

$$
\hat{\rho} = \operatorname{argmin}\_{\rho} \left\| E\rho - s \right\| \, \stackrel{\mathcal{Z}}{\;} + \lambda \left\| \Phi \rho \right\|\_{1} \tag{3}
$$

The problem in Equation (3) is convex and can be solved with a variety of regularization and convex optimization techniques. In cardiac MRI, 8 can be chosen e.g., as the temporal Fourier transform, spatio-temporal total variation, or spatio-temporal wavelets (**Figure 2**). CS has been extensively used in numerous cardiac applications, such as cardiac cine imaging (14, 15), firstpass cardiac perfusion (16), 3D late gadolinium enhancement (LGE) imaging (17), 3D whole-heart coronary MR angiography (CMRA), and more recently for 4D and 5D free-running CMRA (18–21), among many others. We briefly review some of those techniques in the next paragraphs.

#### Cardiac Cine Imaging

Cardiac cine MRI with CS reconstruction has demonstrated accurate estimation of cardiac function in a single-breathhold (22). The study enrolled 81 patients with different cardiac conditions who were imaged using 2D cine acquisition, under three heart beats per slice, with high spatial (1.7 × 1.7 mm<sup>2</sup> ) and temporal resolution (41 ms). A non-linear iterative SENSE-type reconstruction was performed with spatiotemporal regularization using redundant Haar wavelets. The reconstruction was performed inline in ∼3 min for a stack of eight continuous short-axis image. CS reconstruction led to slightly worse image quality compared to conventional PI cardiac cine. A similar acquisition/reconstruction framework was performed on 100 patients referred for CMR in Vermersch et al. (23). Free-breathing 2D motion-corrected cine CMR has been also studied in Usman et al. (14). Acquisition was performed on five healthy subjects using a golden radial pseudorandom sampling and non-rigid respiratory motion-corrected reconstruction with CS temporal regularization was performed offline (reconstruction time ∼2–2.5 h).

A 3D cardiac cine acquisition with CS reconstruction has been proposed to image the left ventricle in a single breathhold (15). Ten healthy subjects and three patients were imaged at 1.9 × 1.9 × 2.5 mm<sup>3</sup> spatial and 42–48 ms temporal resolution in ∼19 s using a Cartesian spiral phyllotaxis sampling (24). Reconstruction times were ∼4 min employing a softgated iterative SENSE reconstruction with spatial and temporal redundant Haar wavelet transforms. Free-breathing 3D cardiac cine has also been proposed to alleviate the requirement of breath-holding in Usman et al. (25). Whole-heart cardiac cine images were acquired in eight healthy subjects and three patients in ∼4–5 min using an accelerated 3D free-running sequence with 2 mm<sup>3</sup> isotropic resolution and ∼31–70 ms temporal resolution. A CS-SENSE reconstruction with total variation regularization and translational respiratory motion correction was performed offline in ∼2.5 h.

#### 3D Late Gadolinium Enhancement Imaging

CS has been employed to increase the spatial resolution and accelerate scan time of LGE imaging for myocardial scar and

fibrosis visualization. Kamesh Iyer et al. (17) proposed a CS technique for rapid 3D LGE imaging for visualization of ablationinduced scar in the left atrium wall in patients with a history of atrial fibrillation and ablation therapy. 3D LGE data was acquired fully sampled on 8 patients and retrospectively undersampled using a variable density sampling with a 3.5-fold acceleration at a resolution of 1.25 × 1.25 × 2.5 mm<sup>3</sup> (acquisition time of ∼10–15 min). CS reconstruction was performed offline after coil compression (four virtual channels reconstructed) using an efficient Split Bregman optimization (26) for fast reconstruction (∼8 s for 44 slices) with 3D total variation regularization. The Split Bregman method has shown to be an efficient solver for many regularized inverse problems with good convergence properties and fast minimization (26).

Basha et al. (27) proposed a patch-based CS technique ("LOST," see next section) to acquire and reconstruct isotropic spatial resolution 1.4 × 1.4 × 1.4 mm<sup>3</sup> 3D LGE data in 270 patients referred for myocardial viability assessment, using a pseudo random k-space undersampling pattern (28) with up to 5-fold accelerated acquisition (∼4 min total acquisition time). LOST reconstruction was performed inline (via CPU cluster) in ∼ 1 h.

#### Whole-Heart CMRA

Forman et al. proposed a free-breathing (29) and multi-breathhold (28) Cartesian spiral phyllotaxis (6.5-fold) acquisition combined with an inline multi-coil SENSE reconstruction and 3D total variation regularization to reconstruct high-resolution (∼1 mm<sup>3</sup> isotropic) CMRA images in ∼52 s. Accelerated non-rigid motion-compensated isotropic (1.2 mm<sup>3</sup> , 3-fold acceleration) 3D CMRA was also performed in ∼5 min using 3D total variation regularization (reconstruction time ∼44 min) and variable density Cartesian acquisition (30). Haar wavelets combined with an efficient FISTA optimization were used for whole-heart navigator-gated CMRA imaging at 3T, employing a Cartesian spiral phyllotaxis sampling at 9-fold acceleration (effective scan time of ∼3 min 45 s at a resolution of 1.3 × 1.3 × 1.2 mm<sup>3</sup> ) (31). A similar optimization was employed at 1.5T to reconstruct CMRA images with an isotropic resolution of 0.8 mm<sup>3</sup> (32). CS techniques based on discrete wavelet transform were also implemented on GPU to bring whole-heart CMRA image reconstruction to <4 s (33).

XD-GRASP (34) and its extensions have been proposed to enable free-breathing whole-heart motion-resolved 5D [(x−y−z) spatial dimensions + respiratory and cardiac phases] CMRA in a single continuous acquisition by exploiting temporal total variation along the cardiac and respiratory dimensions (35– 37). In Feng et al. (35), image acquisition was performed with a continuous 3D golden-angle pattern at isotropic 1.15 mm<sup>3</sup> resolution and ∼40–50 ms temporal resolution (acquisition time ∼14 min). A conjugate gradient optimization was used to reach offline reconstruction times of ∼6 h 48 min. Similar approaches were also proposed for time-resolved, cardiac-resolved, highresolution flow imaging [XD flow (38)].

#### Drawbacks of CS for CMR

Although CS has shown noticeable success in CMR, as reflected by the many applications and recent integration into routine clinical scanners, there remains major drawbacks which may impede its full potential. Firstly, the non-linear nature of the optimization presents a barrier for fast reconstruction time, although notable improvement has been made on the maturation of the algorithms and the move toward GPU implementations to greatly reduced computational times. Another relevant weakness of CS-based reconstruction is the need for tuning regularization parameters that heavily depend on the type of image, sampling trajectories, sparsifying transform, acceleration factor, etc. Finally, while choosing the appropriate transformation basis 8 can contribute to an efficient sparse representation, the robustness of the reconstruction will heavily depend on this specific operator.

### Low-Rank-Based Approaches for CMR Imaging

Another model closely related to sparsity is the notion of lowrank matrices. Low-rank image reconstruction takes advantage of the fact that MR images have inherently a high degree of correlation (e.g., dynamically or locally on a patch scale) and thus can be represented by a union of low-dimensional subspaces. We provide below an overview of some reconstruction techniques incorporating low-rank models employed for CMR imaging.

Globally low-rank (GLR) reconstructions, exploiting lowrankness on the entire image series, have been exploited in many cardiac applications such as dynamic cine MRI (39–41), real-time CMR (42), cardiac perfusion (43), or simultaneous multislice CMR fingerprinting (44). GLR reconstruction techniques are particularly suited for image series that exhibit strong correlation over time. A Casorati matrix is usually formed from the undersampled image sequence, and the missing k-t samples are then estimated using low-rank matrix completion (41, 45, 46). Low-rank reconstruction has been combined with CSbased techniques to further improve image quality, particularly for high acceleration factors. Low-rank plus sparse (L + S) matrix decomposition, which separates the temporally correlated background (L) from the dynamic information (S), has been proposed for dynamic imaging (cardiac cine, cardiac perfusion, and time-resolved angiography) (43, 47). The recently proposed multitasking framework has extended global low-rank reconstruction to deal with multiple overlapping dynamics such as T1/T2 recovery and cardiac and respiratory motions, through tensor decomposition (48, 49).

Locally low-rank (LLR) regularization techniques have also been proposed for CMR reconstruction to further reduce spatial blurring often associated with the GLR techniques (50). In essence, LLR reconstruction techniques exploit low-rankness structure of an image series on local regions (i.e., patch), and have been efficiently used for dynamic CMR imaging (51, 52), high-resolution dynamic myocardial T1 mapping (53) and 5D flow (18).

More recently, patch-based image reconstructions exploiting local (i.e., within a patch) and non-local (i.e., between similar patches) similarities and low-rank matrix representations have been employed for CMR image reconstruction, leading to even sparser representations. In those techniques (a.k.a. LOST and PROST, **Figure 2**) the similarity of 2D/3D image patches have been exploited through block-matching and low-rank decomposition. These techniques have shown to reconstruct highly undersampled LGE (27, 54) and CMRA images with improved image quality compared to CS-based techniques (55, 56) (**Figure 3**). Accelerated free-breathing CMRA in concert with 3D-PROST reconstruction enables isotropic sub-millimeter (0.9 mm<sup>3</sup> ) whole-heart visualization of the coronary vasculature, including small distal segments, in ∼5–7 min acquisition time and ∼3 min reconstruction time (**Figure 3**). Based on a similar idea, patch-based reconstruction has been used for the reconstruction of undersampled 2D cine MR images by extending the patch search to the cardiac temporal dimension (58). The technique has been also extended to multi-contrast CMR reconstruction through high-order tensor decomposition (59) and demonstrated for highly accelerated simultaneous 3D myocardial T1/T2 mapping and cine imaging (60), and 3D whole-heart myocardial T2 mapping (61).

### Dictionary Learning-Based Approached for CMR Imaging

Dictionary learning based CS techniques (also referred as data-driven techniques) have been also proposed for CMR reconstruction. As opposed to conventional CS techniques, where sparse transforms or fixed dictionaries are known a priori, blind compressed sensing (BCS) techniques adaptively learn the sparse representation and dictionaries from the acquired undersampled data itself. These reconstruction techniques have the advantage to be highly adaptive to the image content at hand by learning dictionaries specific to the acquired data and without the need for training data. BCS has shown to outperform conventional CS approaches in several CMR applications such as cardiac cine MRI (62, 63) and contrast enhanced dynamic MRI (64).

Both dictionary learning and CS models can be leveraged to further increase acceleration factors. In Caballero et al. (62), a dictionary learning technique was combined with CS to speed up dynamic CMR imaging (∼8- to 16-fold acceleration). An optimal dictionary is learnt directly from undersampled data online, through processing of spatio-temporal 3D patches, and is used to fill the missing k-space lines. The algorithm was tested on 10 healthy subjects by retrospectively undersampling fully sampled dynamic CMR data. Enforcing temporal gradients with an additional constraint allows to reach higher undersampling factors and accelerate the convergence rate, while consistently showing improvement over non-dictionary-based CS techniques.

Those approaches, however, come at the cost of highly non-convex optimizations, which make theoretical analyses and convergence guarantees very hard, while being often associated with high computational burden and long reconstruction times.

### DEEP LEARNING FOR CMR RECONSTRUCTION

Despite the high promise of CS approaches, robustness of the reconstruction will heavily depend on the choice of the sparsifying transform which may be incapable of capturing the complex structure of CMR images. This may lead to images that look overly smooth or unnatural when too high acceleration factors are considered. A further major drawback is the long computational time usually required with iterative reconstruction algorithms and the need for parameters tuning. An inaccurate choice of reconstruction parameters leads either to over-smoothing or to images with remaining undersampling

artifacts. Taking encouragement from early success in the use of DL in image classification and computer vision, several DL-based MRI reconstruction approaches have been recently proposed to learn models that better describe the reconstruction process and to shift the required optimization effort to an offline training stage, performed beforehand. In other words, rather than performing a reconstruction procedure to compute an appropriate transform between raw data and images for each new data set, DL reconstruction techniques propose to learn

the parameters of that reconstruction procedure in advance, so that it can be applied to all new undersampled data as a simple operation. When using an analytical approach to solve Equation (3) for MR image reconstruction, the applied regularization operator is explicitly described, and the optimization approach is carefully chosen. Generally, the more sophisticated the modeling adopted in reconstruction, the more demanding the optimization process. The aim in DL-based MRI reconstruction, is to replace this optimization with a convenient function f<sup>φ</sup> (·) which is expressed as a DNN with parameters φ. Thus, a computationally efficient direct mapping from the acquired data s to the reconstructed image ρ can be obtained as a result of the neural network's training procedure. Training of a neural network implies changing its weights to optimize the network's output. This is performed by applying an optimization algorithm on a function measuring the difference of the outputs with respect to a target dataset, referred as loss function. Once these weights are learned, a network can be utilized to reconstruct new, unobserved data, and therefore learn to generalize. We will further discuss the training procedure in the section Training Procedure for DL-Based MRI Reconstruction. The main advantage of DL-based reconstruction techniques, with respect to conventional analytical reconstruction techniques, lies in the capability of a DNN to utilize the prior information learnt from the great number of routinely performed MRI exams, to help the reconstruction process. However, due to the problem's high dimensionality, a large dataset of raw k-space data s and target MRI images ρ need to be available to avoid over-fitting in the learning process. Collection of large MRI datasets can be challenging and proposed techniques for MRI reconstruction usually depend on the use of data-augmentation techniques, which is discussed in the section Data Availability for CMR Reconstruction. Given these preliminary remarks, a fundamental question may arise: Under which conditions would we expect DL approaches to outperform CS approaches in terms of reconstruction accuracy in CMR imaging (computational considerations aside)? In this section, we do not aim to provide a definitive answer to this question. Our objective is to provide the reader with a critical approach in reviewing the literature, to be used as guidance in solving their DL-based CMR reconstruction problems. DNN architectures and neural network training procedures will be described first for generic MRI reconstruction, followed by a review of the approaches that have been designed for cardiac applications.

### Neural Networks Architectures for DL-Based MRI Reconstruction

Careful selection and design of the neural network architecture is fundamental to solve the MRI reconstruction problem at hand, since the architecture's design controls the set of available functions f<sup>φ</sup> (·) that are investigated during the learning process. A Neural Network is composed of an input layer, followed by hidden layers that transform the data in a new representation; and it ends with an output layer that generates the neural network's prediction. Each layer is composed of multiple neuron units. The output of the neurons in each layer is given by the weighted sum of the input neurons, followed by a nonlinear function termed Activation Function. A series of fully connected layers and activation functions is referred to as fully-connected neural network. The major advantage of fully connected networks is that they are "structure agnostic," which means that no special assumptions need to be made about the network's input. In the following subsections we briefly discussed neural networks architectures that have been proposed to enable MR image reconstruction.

### Convolutional Neural Networks

Convolutional neural networks (CNN) (65) differ from fullyconnected neural networks by the application of convolutions to each layer. As multiple convolution kernels are applied, several feature maps are defining a novel image characterization. In CNNs, there are usually less parameters with respect to fullyconnected neural networks, since the kernel's weights are fixed as they move across the input image. The reduction in number of parameters simplifies the network's optimization problem. CNNs have been shown to learn interesting features from medical images and to be particularly appropriate to capture their multiscale structure. The use of residual blocks (66) also plays a fundamental role in training DNNs. Instead of learning a complete mapping function between consecutive layers; by adding skip connections between two or more layers, it is possible to learn the residual from the input to the output of a residual block or to the output of the whole neural network. The use of skip connection has been shown to be particularly well-suited to learn image features, such as edges or noise-like artifacts (66).

### Encoder-Decoder CNN

While for conventional CNNs feature map dimensions are fixed, for encoder-decoder CNNs the feature maps are gradually downsampled at each layer down to a convolution with a kernel of size 1 × 1, and then upsampled to the output's size. The first half of the network, the encoder part, learns a representation in a smaller manifold of the input image, and is then given as input to the decoder part of the network to obtain an image with the most meaningful features. Since the encoder part of the network compresses the feature maps' spatial information, a loss of details in the output can be encountered using an encoding-decoding network (67). This issue can be overcome by inserting symmetric skip connections, therefore preserving the important details that are present in the input image. An encoder-decoder network with skip connections is commonly referred to as U-Net network (67).

### Variational Neural Network

In the conventional CNN architectures described above, the input data is convolved with a set of filter kernels which are usually followed by a simple, non-learnable, activation function, e.g., rectified linear unit (ReLU). In a variational neural network (VNN), the regularization term R in Equation (3) is defined as a field of experts model (68):

$$R\left(\rho\right) = \sum\_{k=1}^{FK} \left<\Psi\_k\left(\chi\_k \rho\right), 1\right>\tag{4}$$

Where R is a linear operator that models convolutions of the image ρ with FK filter kernels χ<sup>k</sup> ∈ R <sup>v</sup>×<sup>v</sup> of size v, and learnable non-linear activation function ψ<sup>k</sup> . In the fields of experts model (68), the convolutional kernels and the parameters of the nonlinear activation functions are learned from the data. In contrast to other techniques that make use of ReLU, the parametrizable activation functions ψ<sup>k</sup> , used in Equation (4), are defined as a weighted combination of AF Gaussian radial basis functions. In a VNN architecture, the learning power is therefore shifted from

the sole learning of the filter kernels to the learning of both kernels and non-linear activation functions.

### Training Procedure for DL-Based MRI Reconstruction

In the previous section, generic DNN architecture blocks have been described for solving MRI reconstruction problems. The choice of the architecture structure and of its constitutive elements determines a set of learnable functions, but it is during the training phase that the set of optimal functions for the given reconstruction task is determined. In general, the training procedure can be designed in a supervised or unsupervised fashion. Supervised methods are mostly used for MRI reconstruction, while unsupervised methods are an active topic of ongoing investigation. Therefore, for the rest of this section, we will focus on supervised approaches. In order to learn the network's parameters for the reconstruction procedure at hand, an optimization problem that minimizes a cost function needs to be defined. The training loss function can be defined as:

$$C(\phi) = \frac{1}{2B} \sum\_{b=1}^{B} \left\| \rho\_b^{\Upsilon}(\phi) - \rho\_b^{\text{target}} \right\|\_2^2 \tag{5}$$

Where φ are all the trainable parameters of the reconstruction network. ϒ is the total number of layers in the network, corresponding to the network's gradient steps υ = 1, ... , ϒ. b is the current training output image. B is a randomly selected subset of the complete set of training data, referred as data batch. To solve the non-convex optimization problem in Equation (5), a variant of gradient descent, e.g., stochastic gradient descent or the ADAM optimizer are often used (69). The necessary computation of the gradient with respect to network parameters φ can be computed via backpropagation (70):

$$\frac{\delta \mathcal{C}(\phi)}{\delta \phi^{\upsilon}} = \frac{\delta \rho^{\upsilon+1}}{\delta \phi^{\upsilon}} \cdot \frac{\delta \rho^{\upsilon+2}}{\delta \rho^{\upsilon+1}} \dots \cdot \frac{\delta \rho^{\Upsilon}}{\delta \rho^{\Upsilon-1}} \cdot \frac{\delta \mathcal{C}(\phi)}{\delta \rho^{\Upsilon}} \tag{6}$$

These optimization algorithms require the tuning of hyperparameters, such as strength of regularization or learning rate decay. The choice of the loss function is also crucial for a successful outcome of the training procedure. Because the reconstruction problem is usually formulated as a regression problem, the mean squared error is conventionally utilized as a cost function. Other popular choices are the l<sup>1</sup> norm of the difference and the structural similarity index. Research on generative adversarial networks (71, 72) and learned content loss functions are currently in progress. Once the optimal parameters φ are learned, the reconstructed image ρ can then be estimated from the observed k-space data s by simply computing ρ = f<sup>φ</sup> (s) using the trained network. This efficient functional relationship is a major advantage of neural networks over conventional CS techniques that may require complex inference procedures (73).

### Data Availability for DL-Based CMR Reconstruction

The inference step between input and output of the reconstruction model is highly dependent on the set of input k-space data and of reference images seen during training. This requires the availability of a large set of fully sampled multi-coil k-space data. Undersampled data can be obtained by retrospectively removing k-space data entries according to a sampling trajectory in the forward operator E. This data can be used as input for the reconstruction network during training. The lack of freely accessible databases of fully sampled multi-channel raw k-space data, is an open issue for DL-based CMR reconstruction. In addition, since the dataset used to train a certain model becomes an essential component that defines its performance, it is difficult to compare different approaches if the training data is not publicly available. Even if initiatives for release of annotated CMR images are growing (e.g., UK Biobank), very limited public or institutional k-space CMR raw data have been provided to the research community. Moreover, large data bases of annotated CMR images, such us UK Biobank, are limited to specific type of exams. The DL reconstruction techniques presented in the following section are therefore mostly applied to retrospectively simulated k-space data and are restricted to specific MRI sequences (e.g., cardiac cine MRI).

### Neural Networks Architectures for DL-Based CMR Reconstruction

In this section, we review representative approaches proposed in the literature for MRI image reconstruction with a focus on CMR applications. The different approaches are summarized in **Table 1**.

#### Encoder-Decoder CNN for Image Dealiazing

U-net type of networks that perform an end-to-end mapping in image space have been successfully employed in many MRI post-processing applications (e.g., image segmentation) showing promising results. In the field of image recovery from undersampled k-space data, U-net architectures have been used by several groups to reduce noise-like image artifacts in post processing (see **Figure 4A**).

In Hauptmann et al. (74), a 3D residual U-net have been employed to reduce undersampling artifacts for 2D golden-angle radial cardiac cine MRI. This residual U-Net contains a contracting multi-scale decomposition path and a symmetric expanding path with skip connections at each scale (see **Figure 5**). The 3D-convolutions are trained on entire image sequences (x − y − t) to enforce temporal consistency between cardiac frames. This technique demonstrated robustness with respect to the flickering artifacts that would be present if 2D convolutions were separately applied to each frame. The proposed U-net architecture was trained from 13-fold retrospectively undersampled images using a simulated tiny golden angle radial trajectory. These images were obtained from Cartesian breath-hold (BH) bSSFP cine acquisitions of 250 patients with congenital heart disease (CHD). The trained 3D Unet was then applied to real-time 13-fold accelerated tiny golden angle 2D radial bSSFP data acquired under free-breathing in 10 previously unseen patients with CHD. The radial bSSFP data were recovered with the proposed 3D U-net and reconstructed with CS for image quality and computational time comparisons. Ventricular volume measurements for 10–15 contiguous slices, TABLE 1 | Summary of methods that, to the best of our knowledge, have used a deep-learning-based approach for CMR reconstruction and which have been referred to in this article.


obtained using both the CS reconstructed images and 3D Unet, were compared to a reference Cartesian fully sampled BHbSSFP cardiac cine data. The overall reconstruction time with the residual 3D U-net implemented on graphics processing unit (GPU) was five times faster than conventional CS techniques implemented on CPU (74). Moreover, the overall image quality of the ventricular volume measurements from the 3D U-net recovered images were superior than the CS reconstructions (**Figure 6**). In this study, the validation data was acquired during free-breathing, while the training data was obtained during a breath-hold; the effects of cardiac and respiratory motions were therefore not taken into consideration.

The work presented in Hauptmann et al. (74) demonstrates that 3D CNNs can be employed to map entire undersampled 2D sequences to the corresponding fully-sampled 2D cardiac cine sequences. However, employing 3D convolutional layers requires a higher number of parameters and thus increases the amount of data needed to efficiently train a network and prevent overfitting. In Kofler et al. (75), the authors proposed a technique to recover undersampled 2D golden-angle radial cine CMR by training a modified 2D U-net on the 2D spatio-temporal domain (x−t) extracted from the image sequences (**Figure 7**). This study suggests that the learning process can be improved by training the network on 2D x − t images extracted from the spatio-temporal domain of the cardiac cine sequence. This technique obtained similar results with respect to the 3D U-Net (74) by training the network on a substantially smaller training data set and also proved to be robust with respect to rotations in image space.

The main limitation of the approaches presented in this section, as for all DL techniques applied in post-processing, is that the actual validation data consists of coil-combined magnitude images, instead of multi-coil complex k-space data. Therefore, these approaches do not learn a full reconstruction procedure that accounts for consistency with respect to the acquired k-space data (see **Figure 4**), but also do not take advantage of the full benefits of coil sensitivity encoding underlying parallel imaging.

#### Unrolled Convolutional Neural Networks

In this section, we describe how a DNN can be guided to learn operations that are similar to those performed in conventional iterative CS reconstruction, therefore bridging the gap with conventional iterative techniques. Incorporating domain expertise in a DNN framework can in fact facilitate the learning procedure of the model and result in better estimates of the MR images. For CS-based variable splitting techniques, the optimization problem in Equation (3) is usually solved using an alternating algorithm, iterating between a regularization stage and a data consistency stage. Instead of explicitly defining the regularization term, several DL techniques have been proposed to directly learn the regularization term by using CNNs. These techniques, such as Deep-ADMM net (83), VNN (84), or CascadeNet (76), represent a DL framework of an unrolled version of the iterative constrained reconstruction where the network parameters are trained in order to reconstruct the MR images directly from the undersampled k-space data as an input (see **Figure 4B**).

In particular, Schlemper et al. (76) proposed a framework for the reconstruction of 2D cardiac cine MR images from highly undersampled data using a cascade of CNNs, termed CascadeNet. Since a simple CNN is not efficient in learning the regularization operator iteratively; the authors proposed to concatenate a new CNN on the output of the previous CNN to create a DNN that iterates between CNN regularization operators and data consistency operators. The resulting network consists in convolutional layers, followed by ReLU, residual connections, and data consistency layers. The authors employed a hardprojection solution to enforce data consistency: for each stage of the unrolled model, if the k-space samples are initially unknown (non-acquired), then k-space values obtained from the FT of the previous layer's output are used. For the k-space entries that have been acquired, a linear combination between the estimated values from the previous layer and the original measurements is applied. Since the data consistency step has a simple expression, it is possible to treat it as a layer of a network and to specify the rules for forward and backward propagation for training. By defining the forward and back-backpropagation rules for the

data consistency layer, all stages of the network can be trained in an end-to-end fashion, therefore building one deep network. The authors also demonstrated that spatio-temporal correlations can be efficiently learned by CNNs, combining 3D (x − y − t) convolutions and data sharing approaches. Assuming that for adjacent cardiac frames the difference in data content is relatively small, the neighboring k-space frames along the temporal-axis share similar information. The missing k-space samples for each time frame can then be approximated using the samples from the adjacent cardiac frames. The authors therefore extended the proposed network architecture adding data "sharing layers that take an input image and generate multiple data-shared images" (76). The obtained images are then concatenated along the channel-axis of the network and fed into the proposed cascading network. For separate reconstruction of 2D cardiac single frames, this technique was compared to Dictionary Learning MRI (85), for retrospective undersampling factors of 3- and 9-fold. For reconstruction of cardiac cine MRI, the technique was compared to state-of-the-art CS and low-rank approaches, such as dictionary learning with temporal gradient (62), k-t sparse and low-rank (kt-SLR) (46), and L+S matrix decomposition (43). The presented results demonstrated that the CascadeNet outperforms CS and low-rank approaches in terms of reconstruction error and perceptual quality, particularly for high undersampling rates (**Figure 8**). In addition, for 2D reconstruction, each image could be reconstructed in 23 ms, therefore enabling real-time applications, while for the reconstruction of cine MRI, an entire sequence was reconstructed within 10 s.

It is worth noting that in the experiments shown in Schlemper et al. (76), training and validation data were obtained by retrospectively undersampling single-coil data, thus further validations are required to understand the full potential of this technique for multi-coil prospective acquisitions. Other techniques have applied an unrolled end-to-end framework

FIGURE 6 | Cine MRI images for one representative patient with congenital heart disease, acquired with prospective undersampling of 13-fold. Reconstructed images are presented in peak systole and peak diastole for a reference breath-held balanced steady-state free precession sequence (BH-bSSFP, first column), the real-time radial sequence reconstructed with GRASP (82) (second column) and the residual 3D U-net (third column), as proposed in Ronneberger et al. (67). Images reconstructed with GRASP and the proposed residual 3D U-Net show spatial and temporal blurring, that could be a result of undersampling and incomplete motion correction.

in the more realistic scenario of multi-channel coil complex MR data. For example, Hammernik et al. proposed a trainable formulation for undersampled MRI reconstruction (84), which embedded a PI and a CS reconstruction within a DL unrolled end-to-end framework. Undersampled k-space data and coil sensitivity maps are provided as input to this unrolled model

FIGURE 8 | Comparison of reconstructed 2D cardiac cine MR image sequences employing Dictionary Learning with Temporal Gradient (DLTG) (62) and CascadeNet (CNN-S) (76), from one representative healthy subject with retrospectively undersampling. (A) Ground truth fully-sampled cine MR image, (B) 9x retrospectively undersampled acquisition, (C,D) CascadeNet reconstruction with data sharing and its error map, (E,F) CascadeNet reconstruction without data sharing (CNN) and its error map, (G,H) DLTG reconstruction and its error map. Red ellipses highlight the anatomy that was reconstructed better by CNN than DLTG.

for DL reconstruction, and high-quality MR images are obtained as an output in an end-to-end fashion. The regularization term of this network was implemented as a VNN, and the data consistency term was implemented as the l<sup>2</sup> norm with respect to the acquired k-space data, as in Equation (3). The use of a VNN was first introduced for multi-coil complex-valued MRI reconstruction of 2D static images of the knee.

Building on this work, Fuin et al. (77) extended the previously introduced VNN approach to enable fast reconstruction of undersampled motion-compensated free-breathing whole-heart 3D CMRA. A multi-scale VNN (MS-VNN) architecture was introduced in order to better capture the small caliber of the coronary arteries, as well as whole-heart structural features (x − y − z) in a 3D CMRA image. In order to increase the representation potential of the network, a wider network was implemented, using a multi-scale approach that can capture complementary and richer information at different resolutions. In addition, a training scheme suited for reconstruction of respiratory motion corrupted data was applied. The MS-VNN was trained on retrospectively undersampled (5- and 9-fold) translational motion corrected complex k-space data in an endto-end fashion, in order to ensure that the effect of bulk, respiratory, and cardiac motion was identical in both output and target images during the training process. The MS-VNN reconstruction was then applied to newly acquired prospectively 5- and 9-fold undersampled data and compared to wavelet-based CS (12) reconstructions, as presented in **Figure 9**. MS-VNN outperformed the conventional CS in terms of quantitative right coronary artery sharpness and visible vessel length, with results comparable to the fully sampled scan. MS-VNN combined with 100% respiratory scan efficiency and variable density spiral-like Cartesian undersampling, allowed the acquisition of high-quality 1.2 mm<sup>3</sup> isotropic CMRA images in a short and predictable scan time of ∼2–4 min and their reconstruction in ∼14 s.

Aggarwal et al. (86) introduced a similar network design, termed MoDL, where conventional CNNs are used for the implementation of the regularization term, but where all network stages share the same set of parameters. This unrolled technique with shared parameters, also applies a conjugate-gradient data consistency step instead of the simple gradient based approach utilized in Hammernik et al. (84). The use of a conjugate-gradient step within the network translates into improved results for a given number of iterations at the expense of a slightly longer run time. Another work from the same team combines DL MoDL reconstruction along with complementary analytical image regularization constraints to recover free-breathing cardiac cine MR images from highly undersampled multi-coil measurements (78). This framework alternates between a learned regularization of the image using CNN, an analytically defined SmooThness regularization on manifolds (SToRM) prior (87), and a conjugate gradient data consistency step. The method was tested on only two simulated datasets, but it promises to combine the benefits of CNNs with analytical image regularization priors, such as SToRM, which exploits subject-specific information including cardiac and respiratory patterns.

#### Unrolled Convolutional Recurrent Neural Networks

A recurrent neural network can be thought of as multiple copies of the same network stage, each passing a message to a successor stage. The stage of the recurrent network has a memory that stores the stage time states, and therefore it allows information to be reflected to the next time stage without overloading the system. Qin et al. (79) proposed a novel unrolled convolutional recurrent neural network architecture, termed CRNN-MRI, which reconstructs cine CMR images from highly undersampled k-space data. The proposed CRNN-MRI architecture utilize recurrent connections over each layer of an unrolled network with data consistency layers to reproduce the recurrence existing in the sequential steps of a reconstruction algorithm. Compared to independently learned CNN at each stage of an unrolled network (76), the iteration connections of the CRNN layers allow spatial information learned at a given iteration to be passed to the following iteration. Each stage of the network is therefore optimized depending on the resulting output but also depending on features from previous iterations that can memorize the learned feature and propagate them to the next stage. Secondly, at every stage of the network, the receptive field of a CNRR layer in the spatial domain increases, whereas for a conventional CNN it resets at each stage. Finally, since the network parameters are shared over iterations, the total number of parameters is greatly reduced in comparison to CNNs, potentially offering improved generalization properties. An additional limitation of CNNs is that they accept fixed-sized images as input and produce a fixedsized image as output. Conversely, recurrent nets allow to operate over sequences of images: sequences in the input, the output, or in the most general case in both input and output. Exploiting this property of recurrent networks, the network architecture presented in Qin et al. (79) incorporates bidirectional recurrent convolutional layers that evolve over time to utilize the temporal correlations of the cardiac cine MRI. Consequently, the model architecture evolves in a recurrent manner over time and over steps/iterations. The CRNN-MRI network therefore comprises of bidirectional convolutional recurrent layers, residual connections and hard-projection data consistency layers [as in (76)]. The residual connections were added to address the potential problem of vanishing gradients during back-propagation. Training and validation data were produced by retrospective undersampling complex images obtained from single-coil data as in Schlemper et al. (76). The experimental results demonstrated that CRNN-MRI outperformed state-of-the-art CS-based dynamic MRI and low-rank reconstruction algorithms, such as k-t FOCUSS (88) and k-t SLR (46) for 9- and 16-fold retrospectively undersampled data. Additionally, CRNN-MRI demonstrated to outperform CascadeNet (76), that employs conventional CNNs in the regularization term.

### DL Techniques for K-Space Based CMR Reconstruction

One of the most frequently used techniques for PI undersampled reconstruction in k-space is GRAPPA (89), which employs shiftinvariant convolutions to recover/interpolate non-acquired kspace entries. The convolutional kernels, called autocalibrating signal (ACS), are estimated for each subject from either a fully sampled region at the k-space center or from a separate reference scan (autocalibrating signal or ACS). A CNNs based technique has been recently proposed to improve non-linear k-space interpolation for undersampled PI MRI reconstruction (80). Similar to existing approaches, such as non-linear GRAPPA (90), robust artificial-neural-networks for k-space interpolation (RAKI) (80) trains CNNs on ACS data with an l<sup>2</sup> norm loss; and uses these for interpolating missing k-space samples from

acquired ones. The RAKI network architecture was applied for the reconstruction of myocardial 2D T1 mapping data. Eleven images with different T1 weights were acquired in a single breath-hold using a Cartesian fully sampled bSSFP sequence. Experimental results were then performed on 4- and 5-fold retrospectively undersampled data and RAKI showed improved noise resilience with respect to non-regularized GRAPPA reconstruction. As RAKI is a scan-specific technique and does not require a training data base, it could in theory be applied for the reconstruction of CMR data for which a fully sampled reference acquisition scan cannot be performed, as for example in perfusion or real-time CMR. However, being scan-specific, this approach also comes with downsides, such as high computational burden, computationally expensive training of a neural network for each scan, and the requirement for additional calibration data.

Recently, a technique that combines DL for k-space interpolation and image dealiazing for retrospectively undersampled 2D cardiac cine MRI has been proposed (81). This approach consists of a first frequency domain network architecture for k-space data interpolation followed by a concatenated image domain network architecture for image dealiazing. Both networks consist of concatenated CNN and ReLU layers, followed by a data consistency layer. The first and second networks are connected by a Fourier inversion and only one pass through the network is performed. Additionally, the authors propose a multi-supervised network training technique to constrain the frequency domain information and spatial domain information at different levels.

### DISCUSSION

During the last decades, several undersampled MR reconstruction techniques have been developed to speed up CMR acquisition. These techniques rely on acquiring less data than needed (in the Nyquist sense) and estimating the non-acquired data exploiting some sort of prior information about the images. PI and CS undersampling reconstruction techniques have revolutionized the field, enabling high scan time accelerations to become standard in clinical practice. Despite of its maturity and recent FDA approval for clinical use, some major technical issues associated with CS reconstruction for CMR remain, including high complexity of the algorithms and long reconstruction times, image degradation at high accelerations, and the need for parameters tuning. Therefore, recent AI-based scientific advances have emerged as solutions to transfer the complexity of the CMR reconstruction from the inline side to the offline training side. Unlike analytical techniques for which the reconstruction problem is explicitly defined into the optimization process, DL-based techniques employ large data sets to learn the key reconstruction parameters and priors during an up-front training procedure, providing a fast and efficient reconstruction that can be applied to all newly-acquired cardiac data.

### Strengths and Recent Advances in AI for CMR Reconstruction

The sudden resurgence and popularity of DL approaches for medical image reconstruction can be attributed to their ability to analyze high-dimensional datasets, the availability of computing power, algorithms, web-based storage information, and real-time reconstruction. Although the application of DL to CMR reconstruction is still at an early stage, promising cardiac applications (e.g., dynamic cine MRI or CMRA) have been proposed.

In particular, end-to-end unrolled neural networks models have shown great potential to obtain CMR images that are comparable, in terms of anatomical structure and features, to images obtained with conventional iterative techniques. For example, MS-VNN (77) has shown to obtain high quality static images for prospectively undersampled whole-heart 3D CMRA imaging. Cascade-Net (76) and CRNN-MRI (79), were specifically designed for dynamic imaging and have demonstrated to outperform conventional CS techniques for retrospectively undersampled 2D cardiac cine MRI. Fewer techniques exist for the use of DNN as a k-space estimation problem. This may be due to the non-uniform features of the k-space data (especially for non-Cartesian trajectories), which make it difficult to translate some of the DL techniques that have been developed for image processing of natural images to CMR reconstruction. However, techniques such as RAKI (80) are scanspecific and do not require a training database; and thus, could in theory be applied to cases for which a reference fully-sampled acquisition cannot be performed.

### Limitations and Pitfalls

Although DL-based reconstruction techniques for CMR are showing promising results, there are several remaining challenges that need to be addressed before enabling widespread clinical use.

### Simulation and Lack of Clinical Validation

Most of the existing early DL-based techniques for CMR reconstruction are purely based on simulated data, using retrospective undersampling experiments on fully sampled datasets, and limited to single-coil MR acquisition model. Therefore, it remains to be seen how those techniques will work in a multi-coil setting with prospective undersampling, where additional factors can drastically disrupt the reconstruction and degrade the image quality (e.g., eddy current related effects due to gradient jump, blurring due to off resonant spins with spiral trajectories, more complex noise models, unknown coil sensitivity profiles, cardiac and respiratory motion) and intrinsically result in a reduction of the achievable acceleration factor. Furthermore, those different studies have been so far limited to healthy or small selected patient cohorts, which unfortunately limits their current clinical applicability and clinical impact in more complex scenarios. Further clinical validations are thus warranted to demonstrate the robustness of those techniques.

### Generalization and Reconstruction Quality

A key strength of CMR is the ability to provide images with different contrast for a comprehensive assessment of the disease. Therefore, one open question regarding the applicability of DL-based reconstruction techniques, in practice, is generalization. The generalization potential and effectiveness of these reconstruction techniques should be further investigated in case of, for example, different imaging resolutions, pulse sequences, acquisition trajectories, magnetic fields strength, MR vendors or clinical sites. While it would be feasible to pretrain separate neural networks for different exams, the poor generalization performance of a DL model to different sequence settings, anatomy, physiology, or to unique pathologies, will limit its translation into clinical practice. On this account, there is still an open question that needs to be investigated: can we design a reconstruction network which accurately and precisely extract unique information from limited samples, while generalizing to different acquisition settings and pathologies?

### Data Availability

Another major drawback of DL reconstruction approaches lies in the availability of a specific training data set. The approaches presented in the previous sections have been trained on small samples of hundreds of cases rather than millions, as it is often the case in DL for classification or computer vision. However, the training of reconstruction network still requires the availability of organized and specific data sets that will allow the model to generalize toward new, unseen, test data. Moreover, most of the models presented are developed for few specific cardiac sequences, such as cardiac cine MRI, for which large image datasets are available to researchers (e.g., UK Biobank).

#### Quality of the Training Set

In addition to its size, the quality, and composition of the training set is of utmost importance. Several sequences in CMR, e.g., sub-millimeter CMRA or real-time CMR, cannot be acquired with fully-sampled data due to resolution and time constraints. This hinders the application of supervised training approaches for such datasets, justifying the necessity for future research in scan-specific strategies or unsupervised training. We anticipate that future research could focus on the development of neural networks architectures designed to learn features from different cardiac modalities or different MR acquisitions from other organs, in an unsupervised manner, and the incorporation of more conventional regularizations into the networks. The selection of the cost function also has an influence on the network training and optimization, and it is therefore the topic of currently ongoing research. Research on generative adversarial networks and learned content loss functions are also under progress.

#### Motion Compensated Reconstruction

Additionally, the considerable respiratory- and cardiac-induced motion of the heart during the MR acquisition can significantly impair image quality by showing blurring and/or ghosting like artifacts. Multiple accelerated motion corrected reconstruction frameworks have been developed to simultaneously accelerate scan time and correct for motion during reconstruction. In conventional iterative reconstruction approaches, it is more straightforward to account for motion correction in the reconstruction, as a non-rigid motion model can be directly included in the encoding operator E. Some preliminary simulation work in DL reconstruction have tackled the problem

### REFERENCES


of correcting motion-related artifacts in 2D cardiac cine images during reconstruction by adding an adversarial element to the network architecture (91). However, no DL reconstruction technique has yet explicitly modeled non-rigid motion directly in the reconstruction process. The efficient implementation of 3D non-rigid transformations in a DNN architecture could in fact prove to be challenging and research on the topic is currently in progress.

#### Workflow Integration

Finally, most of the DL techniques proposed for CMR reconstruction are implemented offline. Whilst this may be suitable for initial testing, the inline integration of those techniques will be key for their full adoption in clinical practice. Several frameworks, such as Gadgetron (92) or Yarra (https:// yarra.rocks), have already been proposed for the easy integration of in-house reconstruction techniques into MR scanners; we expect them to play a key role for supporting DL-based reconstruction as well. Many clinical cardiac applications, such as real-time MR-guided cardiac interventions (93) will largely benefit from such inline real-time reconstruction.

### AUTHOR CONTRIBUTIONS

AB, NF, RB, and CP devised and wrote the manuscript.

### FUNDING

The authors acknowledge financial support from EPSRC EP/P001009/, EP/P032311/1, EPSRC EP/P007619, Wellcome EPSRC Centre for Medical Engineering (NS/A000049/1), and the Department of health via the National Institute for Health Research (NIHR) comprehensive Biomedical Research Centre award to Guy's and St. Thomas' NHS Foundation Trust. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR, or the Department of Health.

Resonance Section of the EACVI. Eur Heart J Cardiovasc Imaging. (2015) 16:281–97. doi: 10.1093/ehjci/jeu129


angiography at 3T: a comparison with conventional imaging. Eur J Radiol. (2018) 104:43–8. doi: 10.1016/j.ejrad.2018.04.025


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Bustin, Fuin, Botnar and Prieto. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Deep Learning for Cardiac Image Segmentation: A Review

Chen Chen<sup>1</sup> \*, Chen Qin<sup>1</sup> , Huaqi Qiu<sup>1</sup> , Giacomo Tarroni 1,2, Jinming Duan<sup>3</sup> , Wenjia Bai 4,5 and Daniel Rueckert <sup>1</sup>

<sup>1</sup> Biomedical Image Analysis Group, Department of Computing, Imperial College London, London, United Kingdom, <sup>2</sup> CitAI Research Centre, Department of Computer Science, City University of London, London, United Kingdom, <sup>3</sup> School of Computer Science, University of Birmingham, Birmingham, United Kingdom, <sup>4</sup> Data Science Institute, Imperial College London, London, United Kingdom, <sup>5</sup> Department of Brain Sciences, Faculty of Medicine, Imperial College London, London, United Kingdom

Deep learning has become the most widely used approach for cardiac image segmentation in recent years. In this paper, we provide a review of over 100 cardiac image segmentation papers using deep learning, which covers common imaging modalities including magnetic resonance imaging (MRI), computed tomography (CT), and ultrasound and major anatomical structures of interest (ventricles, atria, and vessels). In addition, a summary of publicly available cardiac image datasets and code repositories are included to provide a base for encouraging reproducible research. Finally, we discuss the challenges and limitations with current deep learning-based approaches (scarcity of labels, model generalizability across different domains, interpretability) and suggest potential directions for future research.

#### Edited by:

Karim Lekadir, University of Barcelona, Spain

#### Reviewed by:

Jichao Zhao, The University of Auckland, New Zealand Marta Nuñez-Garcia, Institut de Rythmologie et Modélisation Cardiaque (IHU-Liryc), France

\*Correspondence: Chen Chen chen.chen15@imperial.ac.uk

#### Specialty section:

This article was submitted to Cardiovascular Imaging, a section of the journal Frontiers in Cardiovascular Medicine

> Received: 30 October 2019 Accepted: 17 February 2020 Published: 05 March 2020

#### Citation:

Chen C, Qin C, Qiu H, Tarroni G, Duan J, Bai W and Rueckert D (2020) Deep Learning for Cardiac Image Segmentation: A Review. Front. Cardiovasc. Med. 7:25. doi: 10.3389/fcvm.2020.00025 Keywords: artificial intelligence, deep learning, neural networks, cardiac image segmentation, cardiac image analysis, MRI, CT, ultrasound

### 1. INTRODUCTION

Cardiovascular diseasess (CVDs) are the leading cause of death globally according to World Health Organization (WHO). About 17.9 million people died from CVDs in 2016, from CVD, mainly from heart disease and stroke<sup>1</sup> . The number is still increasing annually. In recent decades, major advances have been made in cardiovascular research and practice aiming to improve diagnosis and treatment of cardiac diseases as well as reducing the mortality of CVD. Modern medical imaging techniques, such as magnetic resonance imaging (MRI), computed tomography (CT) and ultrasound are now widely used, which enable non-invasive qualitative and quantitative assessment of cardiac anatomical structures and functions and provide support for diagnosis, disease monitoring, treatment planning, and prognosis.

Of particular interest, cardiac image segmentation is an important first step in numerous applications. It partitions the image into a number of semantically (i.e., anatomically) meaningful regions, based on which quantitative measures can be extracted, such as the myocardial mass, wall thickness, left ventricle (LV) and right ventricle (RV) volume as well as ejection fraction (EF) etc. Typically, the anatomical structures of interest for cardiac image segmentation include the LV, RV, left atrium (LA), right atrium (RA), and coronary arteries. An overview of typical tasks related to cardiac image segmentation is presented in **Figure 1**, where applications for the three most commonly used modalities, i.e., MRI, CT, and ultrasound, are shown.

<sup>1</sup>https://www.who.int/cardiovascular\_diseases/about\_cvd/en/

Before the rise of deep learning, traditional machine learning techniques, such as model-based methods (e.g., active shape and appearance models) and atlas-based methods had been shown to achieve good performance in cardiac image segmentation (1– 4). However, they often require significant feature engineering or prior knowledge to achieve satisfactory accuracy. In contrast, deep learning (DL)-based algorithms are good at automatically discovering intricate features from data for object detection and segmentation. These features are directly learned from data using a general-purpose learning procedure and in end-to-end fashion. This makes DL-based algorithms easy to apply to other image analysis applications. Benefiting from advanced computer hardware [e.g., graphical processing units (GPUs) and tensor processing units (TPUs)] as well as increased available data for training, DL-based segmentation algorithms have gradually outperformed previous state-of-the-art traditional methods, gaining more popularity in research. This trend can be observed in **Figure 2A**, which shows how the number of DL-based papers for cardiac image segmentation has increased strongly in the last years. In particular, the number of the publications for MR image segmentation is significantly higher than the numbers of the other two domains, especially in 2017. One reason, which can be observed in **Figure 2B**, is that the publicly available data for MR segmentation has increased remarkably since 2016.

In this paper, we provide an overview of state-of-the-art deep learning techniques for cardiac image segmentation in the three most commonly used modalities (i.e., MRI, CT, ultrasound) in clinical practice and discuss the advantages and remaining limitations of current deep learning-based segmentation methods that hinder widespread clinical deployment. To our knowledge, there have been several review papers that presented overviews about applications of DL-based methods for general medical image analysis (5–7), as well as some surveys dedicated to applications designed for cardiovascular image analysis (8, 9). However, none of them has provided a systematic overview focused on cardiac segmentation applications. This review paper aims at providing a comprehensive overview from the debut to the state-of-the-art of deep learning algorithms, focusing on a variety of cardiac image segmentation tasks (e.g., the LV, RV, and vessel segmentation) (section 3). Particularly, we aim to cover most influential DL-related works in this field published until 1st August 2019 and categorized these publications in terms of specific methodology. Besides, in addition to the basics of deep learning introduced in section 2, we also provide a summary of public datasets (see **Table 6**) as well as public code (see **Table 7**), aiming to present a good reading basis for newcomers to the topic and encourage future contributions. More importantly, we provide insightful discussions about the current research situations (section 3.4) as well as challenges and potential directions for future work (section 4).

### 1.1. Search Criterion

To identify related contributions, search engines like Scopus and PubMed were queried for papers containing ("convolutional" OR "deep learning") and ("cardiac") and ("image segmentation") in title or abstract. Additionally, conference proceedings for MICCAI, ISBI, and EMBC were searched based on the titles of papers. Papers which do not primarily focus on segmentation problems were excluded. The last update to the included papers was on Aug 1, 2019.

## 2. FUNDAMENTALS OF DEEP LEARNING

Deep learning models are deep artificial neural networks. Each neural network consists of an input layer, an output layer, and multiple hidden layers. In the following section, we will review several deep learning networks and key techniques that have been commonly used in state-of-the-art segmentation algorithms. For a more detailed and thorough illustration of the mathematical background and fundamentals of deep learning we refer the interested reader to Goodfellow (43).

### 2.1. Neural Networks

In this section, we first introduce basic neural network architectures and then briefly introduce building blocks which are commonly used to boost the ability of the networks to learn features that are useful for image segmentation.

### 2.1.1. Convolutional Neural Networks (CNNs)

In this part, we will introduce convolutional neural network (CNN), which is the most common type of deep neural networks for image analysis. CNN have been successfully applied to advance the state-of-the-art on many image classification, object detection and segmentation tasks.

**Abbreviations: Imaging-related terminology:** CT, computed tomography; CTA, computed tomography angiography; LAX, long-axis; MPR, multi-planar reformatted; MR, magnetic resonance; MRI, magnetic resonance imaging; LGE, late gadolinium enhancement; RFCA, radio-frequency catheter ablation; SAX, short-axis; 2CH, 2-chamber; 3CH, 3-chamber; 4CH, 4-chamber.

**Cardiac structures and indexes:** AF, atrial fibrillation; AS, aortic stenosis; AO, aorta; CVD, cardiovascular diseases; CAC, coronary artery calcium; DCM, dilated cardiomyopathy; ED, end-diastole; ES, end-systole; EF, ejection fraction; HCM, hypertrophic cardiomyopathy; LA, left atrium; LV, left ventricle; LVEDV, left ventricular end-diastolic volume; LVESV, left ventricular end-systolic volume; MCP, mixed-calcified plaque; MI, myocardial infarction; Myo, left ventricular myocardium; NCP, non-calcified plaque; PA, pulmonary artery; PV, pulmonary vein; RA, right atrium; RV, right ventricle; RVEDV, right ventricular end-diastolic volume; RVESV, right ventricular end-systolic volume; RVEF, right ventricular ejection fraction; WHS, whole heart segmentation.

**Machine learning terminology:** AE, autoencoder; ASM, active shape model; BN, batch normalization; CONV, convolution; CNN, convolutional neural network; CRF, conditional random field; DBN, deep belief network; DL, deep learning; DNN, deep neural network; EM, expectation maximization; FCN, fully convolutional neural network; GAN, generative adversarial network; GRU, gated recurrent units; MSE, mean squared error; MSL, marginal space learning; MRF, markov random field; LSTM, Long-short term memory; ReLU, rectified linear unit; RNN, recurrent neural network; ROI, region-of-interest; SMC, sequential monte carlo; SRF, structured random forest; SVM, support vector machine.

**Cardiac image segmentation datasets:** ACDC, Automated Cardiac Diagnosis Challenge; CETUS, Challenge on Endocardial Three-dimensional Ultrasound Segmentation; MM-WHS, Multi-Modality Whole Heart Segmentation; LASC, Left Atrium Segmentation Challenge; LVSC, Left Ventricle Segmentation Challenge; RVSC, Right Ventricle Segmentation Challenge.

**Others:** EMBC, The International Engineering in Medicine and Biology Conference; GDPR, The General Data Protection Regulation; GPU, graphic processing unit; FDA, United States Food and Drug Administration; ISBI, The IEEE International Symposium on Biomedical Imaging; MICCAI, International Conference on Medical Image Computing and Computer-assisted Intervention; TPU, tensor processing unit; WHO, World Health Organization.

FIGURE 1 | Overview of cardiac image segmentation tasks for different imaging modalities. For better understanding, we provide the anatomy of the heart on the left (image source: Wikimedia Commons, license: CC BY-SA 3.0). Of note, for simplicity, we list the tasks for which deep learning techniques have been applied, which will be discussed in section 3.

As shown in **Figure 3A**, a standard CNN consists of an input layer, an output layer and a stack of functional layers in between that transform an input into an output in a specific form (e.g., vectors). These functional layers often contains convolutional layers, pooling layers and/or fully-connected layers. In general, a convolutional layer CONV<sup>l</sup> contains k<sup>l</sup> convolution kernels/filters, which is followed by a normalization layer [e.g., batch normalization (44)] and a non-linear activation function [e.g., rectified linear unit (ReLU)] to extract k<sup>l</sup> feature maps from the input. These feature maps are then downsampled by pooling layers, typically by a factor of 2, which remove redundant features to improve the statistical efficiency and model generalization. After that, fully connected layers are applied to reduce the dimension of features from its previous layer and find the most task-relevant features for inference. The output of the network is a fix-sized vector where each element can be a probabilistic score for each category (for image classification), a real value for a regression task (e.g., the left ventricular volume estimation) or a set of values (e.g., the coordinates of a bounding box for object detection and localization).

A key component of CNN is the convolutional layer. Each convolutional layer has k<sup>l</sup> convolution kernels to extract k<sup>l</sup> feature maps and the size of each kernel n is chosen to be small in general, e.g., n = 3 for a 2D 3 × 3 kernel, to reduce the number of parameters<sup>2</sup> . While the kernels are small, one can increase the receptive field (the area of the input image that potentially impacts the activation of a particular convolutional kernel/neuron) by increasing the number of convolutional layers. For example, a convolutional layer with large 7×7 kernels can be replaced by three layers with small 3×3 kernels (45). The number of weights is reduced by a factor of 7<sup>2</sup> /(3 × (3<sup>2</sup> )) ≈ 2 while the receptive field remains the same (7 × 7). An online resource<sup>3</sup> is referred here, which illustrates and visualizes the change of receptive field by varying the number of hidden layers and the size of kernels. In general, increasing the depth of convolution neural networks (the number of hidden layers) to enlarge the receptive field can lead to improved model performance, e.g., classification accuracy (45).

CNNs for image classification can also be employed for image segmentation applications without major adaptations to the network architecture (46), as shown in **Figure 3B**. However, this requires to divide each image into patches and then train a CNN to predict the class label of the center pixel for every patch. One major disadvantage of this patch-based approach is that, at inference time, the network has to be deployed for every patch individually despite the fact that there is a lot of redundancy due to multiple overlapping patches in the image. As a result of this inefficiency, the main application of CNNs with fully connected layers for cardiac segmentation is object localization, which aims to estimate the bounding box of the object of interest in an image. This bounding box is then used to crop the image, forming an image pre-processing step to reduce the computational cost for segmentation (47). For efficient, endto-end pixel-wise segmentation, a variant of CNNs called fully convolutional neural network (FCN) is more commonly used, which will be discussed in the next section.

#### 2.1.2. Fully Convolutional Neural Networks (FCNs)

The idea of FCN was first introduced by Long et al. (48) for image segmentation. FCNs are a special type of CNNs that

<sup>3</sup>https://fomoro.com/research/article/receptive-field-calculator

do not have any fully connected layers. In general, as shown in **Figure 4A**, FCNs are designed to have an encoder-decoder structure such that they can take input of arbitrary size and produce the output with the same size. Given an input image, the encoder first transforms the input into high-level feature representation whereas the decoder interprets the feature maps and recovers spatial details back to the image space for pixelwise prediction through a series of upsampling and convolution operations. Here, upsampling can be achieved by applying transposed convolutions, e.g., 3 × 3 transposed convolutional kernels with a stride of 2 to up-scale feature maps by a factor of 2. These transposed convolutions can also be replaced by unpooling layers and upsampling layers. Compared to a patchbased CNN for segmentation, FCN is trained and applied to the entire images, removing the need for patch selection (50).

FCN with the simple encoder-decoder structure in **Figure 4A** may be limited to capture detailed context information in an image for precise segmentation as some features may be eliminated by the pooling layers in the encoder. Several variants of FCNs have been proposed to propagate features from the encoder to the decoder, in order to boost the segmentation accuracy. The most well-known and most popular variant of FCNs for biomedical image segmentation is the U-net (49). On the basis of the vanilla FCN (48), the U-net employs skip connections between the encoder and decoder to recover spatial context loss in the down-sampling path, yielding more precise segmentation (see **Figure 4B**). Several state-of-the-art cardiac image segmentation methods have adopted the U-net or its 3D variants, the 3D U-net (51) and the 3D V-net (52), as their backbone networks, achieving promising segmentation accuracy for a number of cardiac segmentation tasks (26, 53, 54).

#### 2.1.3. Recurrent Neural Networks (RNNs)

Recurrent neural networks (RNNs) are another type of artificial neural networks which are used for sequential data, such as cine MRI and ultrasound image sequences. An RNN can "remember" the past and use the knowledge learned from the past to make its present decision (see **Figures 5A,B**). For example, given a sequence of images, an RNN takes the first image as input, captures the information to make a prediction and then memorize this information which is then utilized to make a prediction for the next image. The two most widely used architectures in the family of RNNs are LSTM (56) and gated recurrent unit (GRU) (57), which are capable of modeling longterm memory. A use case for cardiac segmentation is to combine an RNN with a 2D FCN so that the combined network is capable of capturing information from adjacent slices to improve the inter-slice coherence of segmentation results (55).

#### 2.1.4. Autoencoders (AE)

Autoencoders (AEs) are a type of neural networks that are designed to learn compact latent representations from data without supervision. A typical architecture of an autoencoder consists of two networks: an encoder network and a decoder network for the reconstruction of the input (see **Figure 6**). Since the learned representations contain generally useful information in the original data, many researchers have

<sup>2</sup> In a convolution layer l with k<sup>l</sup> 2D n × n convolution kernels, each convolution kernel CONV(i) l , i ∈ (1, k<sup>l</sup> ) has a weight matrix **w** (i) l and a bias term b (i) l as parameters and can be formulated as: **y** = **w** (i) l ◦ **x**in + b (i) l , where **w** (i) l ∈ R n×n×lin , b (i) l ∈ R, **x**in ∈ R H×W×lin , **y** ∈ R <sup>H</sup>′×W′×k<sup>l</sup> , lin denotes the number of channels in the input **x**in and ◦ denotes the convolution operation. Thus, the number of parameters in a convolutional layer is k<sup>l</sup> × (n <sup>2</sup> × lin + 1). For a convolutional layer with 16 3 × 3 filters where the input is a 28 × 28 × 1 2D gray image, the number of parameters in this layer is 16×(3<sup>2</sup> ×1+1) = 160. For more technical details about convolutional neural networks, an online tutorial is referred here: http://cs231n.github.io/convolutional-networks.

classification, one can finally get a pixel-wise segmentation map for the whole image. LV, left ventricle cavity; RV, right ventricle cavity; BG, Background; Myo, left ventricular myocardium. The blue number at the top indicates the number of channels of the feature maps. Here, each convolution kernel is a 3 × 3 kernel (stride = 1,

employed autoencoders to extract general semantic features or shape information from input images or labels and then use those

padding = 1), which will produces an output feature map with the same height and width as the input.

#### 2.1.5. Generative Adversarial Networks (GAN)

features to guide the cardiac image segmentation (58, 62, 63).

The concept of Generative adversarial network (GAN) was proposed by Goodfellow et al. (64) for image synthesis from noise. GANs are a type of generative models that learn to model the data distribution of real data and thus are able to create new image examples. As shown in **Figure 7A**, a GAN consists of two networks: a generator network and a discriminator network. During training, the two networks are trained to compete against each other: the generator produces fake images aimed at fooling the discriminator, whereas the discriminator tries to identify real images from fake ones. This type of training is referred to as "adversarial training," since the two models are both set to win the competition. This training scheme can also be used for training a segmentation network. As shown in **Figure 7B**, the generator is replaced by a segmentation network and the discriminator is required to distinguish the generated segmentation maps from the ground truth ones (the target segmentation maps). In this way, the segmentation network is encouraged to produce more anatomically plausible segmentation maps (65, 66).

#### 2.1.6. Advanced Building Blocks for Improved Segmentation

Medical image segmentation, as an important step for quantitative analysis and clinical research, requires high pixel-wise accuracy. Over the past years, many researchers have developed advanced building blocks to learn robust,

aggregate feature maps from coarse to fine through concatenation and convolution operations. For simplicity, we reduce the number of downsampling and upsampling blocks in the diagram. For detailed information, we recommend readers to the original paper (49).

representative features for precise segmentation. These techniques have been widely applied to state-of-the-art neural networks (e.g., U-net) to improve cardiac image segmentation performance. Therefore, we identified several important techniques reported in the literature to this end and present them with corresponding references for further reading. These techniques are:

	- Inception modules (44, 67, 68), which concatenate multiple convolutional filter banks with different kernel sizes to extract multi-scale features in parallel (see **Figure 8A**);
	- Dilated convolutional kernels (72), which are modified convolution kernels with the same kernel size but different kernel strides to process input feature maps at larger scales;
	- Deep supervision (73), which utilizes the outputs from multiple intermediate hidden layers for multi-scale prediction;
	- Attention units (69, 70, 76), which learn to adaptively recalibrate features spatially (see **Figure 8B**);
	- Squeeze-and-excitation blocks (77), which are used to recalibrate features with learnable weights across channels;
	- Residual connections (71), which add outputs from a previous layer to the feature maps learned from the current layer (see **Figure 8C**);
	- Dense connections (78), which concatenate outputs from all preceding layers to the feature maps learned from the current layer.

FIGURE 6 | A generic architecture of an autoencoder. An autoencoder employs an encoder-decoder structure, where the encoder maps the input data to a low-dimensional latent representation and the decoder interprets the code and reconstructs the input. The learned latent representation has been found effective for cardiac image segmentation (58, 59), cardiac shape modeling (60) and cardiac segmentation correction (61).

### 2.2. Training Neural Networks

Before being able to perform inference, neural networks must be trained. Standard training process requires a dataset that contains paired images and labels {**x**, **y**} for training and testing, an optimizer (e.g., stochastic gradient descent, Adam) and a loss function to update the model parameters. This function accounts for the error of the network prediction in each iteration during training, providing signals for the optimizer to update the

network parameters through backpropagation (43, 79). The goal of training is to find proper values of the network parameters to minimize the loss function.

#### 2.2.1. Common Loss Functions

For regression tasks (e.g., heart localization, calcium scoring, landmark detection, image reconstruction), the simplest loss function is the mean squared error (MSE):

$$\mathcal{L}\_{\text{MSE}} = \frac{1}{n} \sum\_{i=1}^{n} (\mathfrak{y}\_i - \hat{\mathfrak{y}}\_i)^2,\tag{1}$$

where **y**<sup>i</sup> is the vector of target values and **y**ˆ i is the vector of the predicted values; n is the number of data samples at each iteration.

Cross-entropy is the most common loss for both image classification and segmentation tasks. In particular, the cross-entropy loss for segmentation summarizes pixel-wise probability errors between a predicted probabilistic output **p** c i and its corresponding target segmentation map **y** c i for each class c 4 :

$$\mathcal{L}\_{\text{CE}} = -\frac{1}{n} \sum\_{i=1}^{n} \sum\_{\varepsilon=1}^{C} \mathfrak{z}\_{i}^{\varepsilon} \log(\mathfrak{p}\_{i}^{\varepsilon}),\tag{2}$$

<sup>4</sup>At inference time, the predicted segmentation map for each image is obtained by assigning each pixel with the class of the highest probability: **y**ˆ <sup>i</sup> = argmax<sup>c</sup> **p** c i .

where C is the number of all classes. Another loss function which is specifically designed for object segmentation is called soft-Dice loss function (52), which penalizes the mismatch between a predicted segmentation map and its target map at pixel-level:

$$\mathcal{L}\_{Dice} = 1 - \frac{2\sum\_{i=1}^{n} \sum\_{c=1}^{C} \mathbf{y}\_i^c \mathbf{p}\_i^c}{\sum\_{i=1}^{n} \sum\_{c=1}^{C} (\mathbf{y}\_i^c + \mathbf{p}\_i^c)}. \tag{3}$$

In addition, there are several variants of the cross-entropy or soft-Dice loss, such as the weighted cross-entropy loss (25, 80) and weighted soft-Dice loss (29, 81) that are used to address potential class imbalance problem in medical image segmentation tasks where the loss term is weighted to account for rare classes or small objects.

### 2.2.2. Reducing Over-Fitting

The biggest challenge of training deep networks for medical image analysis is over-fitting, due to the fact that there is often a limited number of training images in comparison with the number of learnable parameters in a deep network. A number of techniques have been developed to alleviate this problem. Some of the techniques are the following ones:


• **Transfer learning**: Transfer learning aims to transfer knowledge from one task to another related but different target task. This is often achieved by reusing the weights of a pre-trained model, to initialize the weights in a new model for the target task. Transfer learning can help to decrease the training time and achieve lower generalization error (85).

### 2.3. Evaluation Metrics

To quantitatively evaluate the performance of automated segmentation algorithms, three types of metrics are commonly used: (a) volume-based metrics (e.g., Dice metric, Jaccard similarity index); (b) surface distance-based metrics (e.g., mean contour distance, Hausdorff distance); (c) clinical performance metrics (e.g., ventricular volume and mass). For a detailed illustration of common used clinical indices in cardiac image analysis, we recommend the review paper by Peng et al. (2). In our paper, we mainly report the accuracy of methods in terms of the Dice metric for ease of comparison. The Dice score measures the ratio of overlap between two results (e.g., automatic segmentation vs. manual segmentation), ranging from 0 (mismatch) to 1 (perfect match). It is also important to note that the segmentation accuracy of different methods are not directly comparable in general, unless these methods are evaluated on the same dataset. This is because, even for the same segmentation task, different datasets can have different imaging modalities, different patient populations and different methods of image acquisition, which will affect the task complexities and result in different segmentation performances.

### 3. DEEP LEARNING FOR CARDIAC IMAGE SEGMENTATION

In this section, we provide a summary of deep learningbased applications for the three main imaging modalities: MRI, CT, and ultrasound regarding specific applications for targeted structures. In general, these deep learning-based methods provide an efficient and effective way to segmenting particular organs or tissues (e.g., the LV, coronary vessels, scars) in different modalities, facilitating follow-up quantitative analysis of cardiovascular structure and function. Among these works, a large portion of these methods are designed for ventricle segmentation, especially in MR and ultrasound domains. The objective of ventricle segmentation is to delineate the endocardium and epicardium of the LV and/or RV. These segmentation maps are important for deriving clinical indices, such as left ventricular end-diastolic volume (LVEDV), left ventricular end-systolic volume (LVESV), right ventricular enddiastolic volume (RVEDV), right ventricular end-systolic volume (RVESV), and EF. In addition, these segmentation maps are essential for 3D shape analysis (60, 86), 3D + time motion analysis (87), and survival prediction (88).

### 3.1. Cardiac MR Image Segmentation

Cardiac MRI is a non-invasive imaging technique that can visualize the structures within and around the heart. Compared to CT, it does not require ionizing radiation. Instead, it relies on the magnetic field in conjunction with radio-frequency waves to excite hydrogen nuclei in the heart, and then generates an image by measuring their response. By utilizing different imaging sequences, cardiac MRI allows accurate quantification of both cardiac anatomy and function (e.g., cine imaging) and pathological tissues, such as scars (late gadolinium enhancement (LGE) imaging). Accordingly, cardiac MRI is currently regarded as the gold standard for quantitative cardiac analysis (89).

A group of representative deep learning based cardiac MR segmentation methods are shown in **Table 1**. From the table, one can see that a majority of works have focused on segmenting cardiac chambers (e.g., LV, RV, LA). In contrast, there are relatively fewer works on segmenting abnormal cardiac tissue regions, such as myocardial scars and atrial fibrosis from contrast-enhanced images. This is likely due to the limited relevant public datasets as well as the difficulty of the task. In addition, to the best of our knowledge, there are very few works that apply deep learning techniques to atrial wall segmentation, as also suggested by a recent survey paper (161). In the following sections, we will describe and discuss these methods regarding different applications in detail.

### 3.1.1. Ventricle Segmentation

### **3.1.1.1. Vanilla FCN-based segmentation**

Tran (24) was among the first ones to apply a FCN (50) to segment the left ventricle, myocardium and right ventricle directly on short-axis cardiac magnetic resonance (MR) images. Their end-to-end approach based on FCN achieved competitive segmentation performance, significantly outperforming traditional methods in terms of both speed and accuracy. In the following years, a number of works based on FCNs have been proposed, aiming at achieving further improvements in segmentation performance. In this regard, one stream of work focuses on optimizing the network structure to enhance the feature learning capacity for segmentation (29, 80, 91, 162–165). For example, Khened et al. (29) developed a dense U-net with inception modules to combine multi-scale features for robust segmentation across images with large anatomical variability. Jang et al. (80), Yang et al. (81), Sander et al. (166), and Chen et al. (167) investigated different loss functions, such as weighted cross-entropy, weighted Dice loss, deep supervision loss and focal loss to improve the segmentation performance. Among these FCN-based methods, the majority of approaches use 2D networks rather than 3D networks for segmentation. This is mainly due to the typical low through-plane resolution and motion artifacts of most cardiac MR scans, which limits the applicability of 3D networks (25).

### **3.1.1.2. Introducing spatial or temporal context**

One drawback of using 2D networks for cardiac segmentation is that these networks work slice by slice, and thus they do not leverage any inter-slice dependencies. As a result, 2D networks can fail to locate and segment the heart on challenging slices, such as apical and basal slices where the contours of the ventricles are not well-defined. To address this problem, a number of works have attempted to introduce additional contextual information to guide 2D FCN. This contextual information can include shape priors learned from labels or multi-view images (109, 110, 168). TABLE 1 | A summary of representative deep learning methods on cardiac MRI segmentation.


By default, LV/RV and LA/RA segmentation refer to the left/right ventricle cavity segmentation and left/right atrium cavity segmentation, respectively. The same applies to Tables 2–5. SAX, short-axis view; 2CH, 2-chamber view; 4CH, 4-chamber view; ED, end-diastolic; ES, end-systolic; Myo, Left ventricular myocardium.

Others extract spatial information from adjacent slices to assist the segmentation, using recurrent units (RNNs) or multi-slice networks (2.5D networks) (27, 55, 92, 169). These networks can also be applied to leveraging information across different temporal frames in the cardiac cycle to improve spatial and temporal consistency of segmentation results (28, 93, 169–171).

#### **3.1.1.3. Applying anatomical constraints**

Another problem that may limit the segmentation performance of both 2D and 3D FCNs is that they are typically trained with pixel-wise loss functions only (e.g., cross-entropy or soft-Dice losses). These pixel-wise loss functions may not be sufficient to learn features that represent the underlying anatomical structures. Several approaches therefore focus on designing and applying anatomical constraints to train the network to improve its prediction accuracy and robustness. These constraints are represented as regularization terms which take into account the topology (172), contour and region information (173), or shape information (59, 63), encouraging the network to generate more anatomically plausible segmentations. In addition to regularizing networks at training time (61), proposed a variational AE to correct inaccurate segmentations, at the post-processing stage.

#### **3.1.1.4. Multi-task learning**

Multi-task learning has also been explored to regularize FCN-based cardiac ventricle segmentation during training by performing auxiliary tasks that are relevant to the main segmentation task, such as motion estimation (174), estimation of cardiac function (175), ventricle size classification (176), and image reconstruction (177–179). Training a network for multiple tasks simultaneously encourages the network to extract features which are useful across these tasks, resulting in improved learning efficiency and prediction accuracy.

#### **3.1.1.5. Multi-stage networks**

Recently, there is a growing interest in applying neural networks in a multi-stage pipeline which breaks down the segmentation problem into subtasks (27, 94, 95, 108, 180). For example, Zheng et al. (27) and Li et al. (108) proposed a region-of-interest (ROI) localization network followed by a segmentation network. Likewise, Vigneault et al. (95) proposed a network called Omega-Net which consists of a U-net for cardiac chamber localization, a learnable transformation module to normalize image orientation and a series of U-nets for fine-grained segmentation. By explicitly localizing the ROI and by rotating the input image into a canonical orientation, the proposed method better generalizes to images with varying sizes and orientations.

### **3.1.1.6. Hybrid segmentation methods**

Another stream of work aims at combining neural networks with classical segmentation approaches, e.g., level-sets (98, 181), deformable models (47, 96, 182), atlas-based methods (97, 111), and graph-cut based methods (183). Here, neural networks are applied in the feature extraction and model initialization stages, reducing the dependency on manual interactions and improving the segmentation accuracy of the conventional segmentation methods deployed afterwards. For example, Avendi et al. (47) proposed one of the first DL-based methods for LV segmentation in cardiac short-axis MR images. The authors first applied a CNN to automatically detect the LV and then used an AE to estimate the shape of the LV. The estimated shape was then used to initialize follow-up deformable models for shape refinement. As a result, the proposed integrated deformable model converges faster than conventional deformable models and the segmentation achieves higher accuracy. In their later work, the authors extended this approach to segment RV (96). While these hybrid methods demonstrated better segmentation accuracy than previous non-deep learning methods, most of them still require an iterative optimization for shape refinement. Furthermore, these methods are often designed for one particular anatomical structure. As noted in the recent benchmark study (17), most state-of-the-art segmentation algorithms for biventricle segmentation are based on end-to-end FCNs, which allows the simultaneous segmentation of the LV and RV.

To better illustrate these developments for cardiac ventricle segmentation from cardiac MR images, we collate a list of biventricle segmentation methods that have been trained and tested on the Automated Cardiac Diagnosis Challenge (ACDC) dataset, reported in **Table 2**. For ease of comparison, we only consider those methods which have been evaluated on the same online test set (50 subjects). As the ACDC challenge organizers keep the online evaluation platform open to the public, our comparison not only includes the methods from the original challenge participants [summarized in the benchmark study paper from Bernard et al. (17)] but also three segmentation algorithms that have been proposed after the challenge [i.e., (61, 108, 109)]. From this comparison, one can see that top algorithms are the ensemble method proposed by Isensee et al. (26) and the twostage method proposed by Li et al. (108), both of which are based on FCNs. In particular, compared to the traditional levelset method (112), both methods achieved considerably higher accuracy even for the more challenging segmentation of the left ventricular myocardium (Myo), indicating the power of deep learning based approaches.

#### 3.1.2. Atrial Segmentation

Atrial fibrillation (AF) is one of the most common cardiac electrical disorders, affecting around 1 million people in the UK<sup>5</sup> . Accordingly, atrial segmentation is of prime importance in the clinic, improving the assessment of the atrial anatomy in both pre-operative AF ablation planning and post-operative follow-up evaluations. In addition, the segmentation of atrium can be used as a basis for scar segmentation and atrial fibrosis quantification from LGE images. Traditional methods, such as region growing (184) and methods that employ strong priors [i.e., atlas-based label fusion (185) and non-rigid registration (186)] have been applied in the past for automated left atrium segmentation. However, the accuracy of these methods highly relies on good initialization and ad-hoc pre-processing methods, which limits the widespread adoption in the clinic.

Recently, Vigneault et al. (95) and Bai et al. (31) applied 2D FCNs to directly segment the LA and RA from standard 2D long-axis images, i.e., 2-chamber (2CH), 4-chamber (4CH)

<sup>5</sup>https://www.nhs.uk/conditions/atrial-fibrillation/


TABLE 2 | Segmentation accuracy of state-of-the-art segmentation methods verified on the cardiac bi-ventricular segmentation challenge (ACDC) dataset (17).

All the methods were evaluated on the same test set (50 subjects). Bold numbers are the highest overall Dice values for the corresponding structure. LV, left ventricle cavity; RV, right ventricle cavity; Myo, left ventricular myocardium; ED, end-diastolic; ES, end-systolic. Last update: 2019.8.1.

Note that for simplicity, we report the average Dice scores for each structure over ED and ES phases. More detailed comparison for different phases can be found on the public leaderboard in the post-testing part (https://acdc.creatis.insa-lyon.fr) as well as corresponding published works in this table.

views. Notably, their networks can also be trained to segment ventricles from 2D short-axis stacks without any modifications to the network architecture. Likewise, Xiong et al. (100), Preetha et al. (187), Bian et al. (188), and Chen et al. (34) applied 2D FCNs to segment the atrium from 3D LGE images in a sliceby-slice fashion, where they optimized the network structure for enhanced feature learning. 3D networks (54, 189–192) and multi-view FCN (99, 193) have also been explored to capture 3D global information from 3D LGE images for accurate atrium segmentation.

In particular, Xia et al. (54) proposed a fully automatic two-stage segmentation framework which contains a first 3D U-net to roughly locate the atrial center from down-sampled images followed by a second 3D U-net to accurately segment the atrium in the cropped portions of the original images at full resolution. Their multi-stage approach is both memory-efficient and accurate, ranking first in the left atrium segmentation challenge 2018 (LASC'18) with a mean Dice score of 0.93 evaluated on a test set of 54 cases.

#### 3.1.3. Scar Segmentation

Scar characterization is usually performed using LGE MR imaging, a contrast-enhanced MR imaging technique. LGE MR imaging enables the identification of myocardial scars and atrial fibrosis, allowing improved management of myocardial infarction and atrial fibrillation (194). Prior to the advent of deep learning, scar segmentation was often performed using intensity thresholding-based or clustering methods which are sensitive to the local intensity changes (103). The main limitation of these methods is that they usually require the manual segmentation of the region of interest to reduce the search space and the computational costs (195). As a result, these semiautomated methods are not suitable for large-scale studies or clinical deployment.

Deep learning approaches have been combined with traditional segmentation methods for the purpose of scar segmentation: Yang et al. (101, 196) applied an atlas-based method to identify the left atrium and then applied deep neural networks to detect fibrotic tissue in that region. Relatively to end-to-end approaches, Chen et al. (102) applied deep neural networks to segment both the left atrium and the atrial scars. In particular, the authors employed a multi-view CNN with a recursive attention module to fuse features from complementary views for better segmentation accuracy. Their approach achieved a mean Dice score of 0.90 for the LA region and a mean Dice score of 0.78 for atrial scars.

In the work of Fahmy et al. (197), the authors applied a U-net based network to segment the myocardium and the scars at the same time from LGE images acquired from patients with hypertrophic cardiomyopathy (HCM), achieving a fast segmentation speed. However, the reported segmentation accuracy for the scar regions was relatively low (mean Dice: 0.58). Zabihollahy et al. (103) and Moccia et al. (104) instead adopted a semi-automated method which requires a manual segmentation of the myocardium followed by the application of a 2D network to differentiate scars from normal myocardium. They reported higher segmentation accuracy on their test sets (mean Dice >0.68). At the moment, fully-automated scar segmentation is still a challenging task since the infarcted regions in patients can lead to kinematic variabilities and abnormalities in those contrast-enhanced images. Interestingly, Xu et al. (105) developed an RNN which leverages motion patterns to automatically delineate myocardial infarction area from cine MR image sequences without contrast agents. Their method achieved a high overall Dice score of 0.90 when compared to the manual annotations on LGE MR images, providing a novel approach for infarction assessment.

### 3.1.4. Aorta Segmentation

The segmentation of the aortic lumen from cine MR images is essential for accurate mechanical and hemodynamic characterization of the aorta. One common challenge for this task is the typical sparsity of the annotations in aortic cine image sequences, where only a few frames have been annotated. To address the problem, Bai et al. (32) applied a non-rigid image registration method (198) to propagate the labels from the annotated frames to the unlabeled neighboring ones in the cardiac cycle, effectively generating pseudo annotated frames that could be utilized for further training. This semi-supervised method achieved an average Dice metric of 0.96 for the ascending aorta and 0.95 for the descending aorta over a test set of 100 subjects. In addition, compared to a previous approach based on deformable models (199), their approach based on FCN and RNN can directly perform the segmentation task on a whole image sequence without requiring the explicit estimation of the ROI.

### 3.1.5. Whole Heart Segmentation

Apart from the above mentioned segmentation applications which target one particular structure, deep learning can also be applied to segmenting the main substructures of the heart in 3D MR images (30, 106, 107, 200). An early work from Yu et al. (30) adopted a 3D dense FCN to segment the myocardium and blood pool in the heart from 3D MR scans. Recently, more and more methods began to apply deep learning pipelines to segment more specific substructures [including four chambers, aorta, pulmonary vein (PV)] in both 3D CT and MR images. This has been facilitated by the availability of a public dataset for whole heart segmentation [Multi-Modality Whole Heart Segmentation (MM-WHS)] which consists of both CT and MRI images. We will discuss these segmentation methods in the next CT section in further detail (see section 3.2.1).

## 3.2. Cardiac CT Image Segmentation

CT is a non-invasive imaging technique that is performed routinely for disease diagnosis and treatment planning. In particular, cardiac CT scans are used for the assessment of cardiac anatomy and specifically the coronary arteries. There are two main imaging modalities: non-contrast CT imaging and contrast-enhanced coronary CT angiography (CTA). Typically, non-contrast CT imaging exploits density of tissues to generate an image, such that different densities using various attenuation values, such as soft tissues, calcium, fat, and air can be easily distinguished, and thus allows to estimate the amount of calcium present in the coronary arteries (201). In comparison, contrast-enhanced coronary CTA, which is acquired after the injection of a contrast agent, can provide excellent visualization of cardiac chambers, vessels and coronaries, and has been shown to be effective in detecting non-calcified coronary plaques. In the following sections, we will review some of the most commonly used deep learning-based cardiac CT segmentation methods. A summary of these approaches is presented in **Table 3**.

### 3.2.1. Cardiac Substructure Segmentation

Accurate delineation of cardiac substructures plays a crucial role in cardiac function analysis, providing important clinical variables, such as EF, myocardial mass, wall thickness etc. Typically, the cardiac substructures that are segmented include the LV, RV, LA, RA, Myo, aorta (AO), and pulmonary artery (PA).

### **3.2.1.1. Two-step segmentation**

One group of deep learning methods relies on a two-step segmentation procedure, where a ROI is first extracted and then fed into a CNN for subsequent classification (113, 202). For instance, Zreik et al. (113) proposed a two-step LV segmentation process where a bounding box for the LV is first detected using the method described in de Vos et al. (203), followed by a voxel classification within the defined bounding box using a patch-based CNN. More recently, FCN, especially U-net (49), has become the method of choice for cardiac CT segmentation. Zhuang et al. (19) provides a comparison of a group of methods (36, 114, 115, 117, 118, 137) for whole heart segmentation (WHS) that have been evaluated on the MM-WHS challenge. Several of these methods (37, 114–116) combine a localization network, which produces a coarse detection of the heart, with 3D FCNs applied to the detected ROI for segmentation. This allows the segmentation network to focus on the anatomically relevant regions, and has shown to be effective for whole heart segmentation. A summary of the comparison between the segmentation accuracy of the methods evaluated on MM-WHS dataset is presented in **Table 4**. These methods generally achieve better segmentation accuracy on CT images compared to that of MR images, mainly because of the smaller variations in image intensity distribution across different CT scanners and better image quality (19). For a detailed discussion on these listed methods, please refer to Zhuang et al. (19).

### **3.2.1.2. Multi-view CNNs**

Another line of research utilizes the volumetric information of the heart by training multi-planar CNNs (axial, sagittal, and coronal views) in a 2D fashion. Examples include Wang et al. (117) and Mortazi et al. (118) where three independent orthogonal CNNs were trained to segment different views. Specifically, Wang et al. (117) additionally incorporated shape context in the framework for the segmentation refinement, while Mortazi et al. (118) adopted an adaptive fusion strategy to combine multiple outputs utilizing complementary information from different planes.

### **3.2.1.3. Hybrid loss**

Several methods employ a hybrid loss, where different loss functions (such as focal loss, Dice loss, and weighted categorical cross-entropy) are combined to address the class imbalance issue, e.g., the volume size imbalance among different ventricular structures, and to improve the segmentation performance (36, 119).

In addition, the work of Zreik et al. (120) has proposed a method for the automatic identification of patients with significant coronary artery stenoses through the segmentation

#### TABLE 3 | A summary of selected deep learning methods on cardiac CT segmentation.


and analysis of the LV myocardium. In this work, a multiscale FCN is first employed for myocardium segmentation, and then a convolutional autoencoder is used to characterize the LV myocardium, followed by a support vector machine (SVM) to classify patients based on the extracted features.

#### 3.2.2. Coronary Artery Segmentation

Quantitative analysis of coronary arteries is an important step for the diagnosis of cardiovascular diseases, stenosis grading, blood flow simulation and surgical planning (204). Though this topic has been studied for years (4), only a small number of works


TABLE 4 | Segmentation accuracy of methods validated on MM-WHS dataset.

The training set contains 20 CT and 20 MRI whereas the test set contains 40 CT and 40 MRI. Reported numbers are Dice scores (CT/MRI) for different substructures on both CT and MRI scans. For more detailed comparisons, please refer to Zhuang et al. (19). The bold number in each column represents the highest score for the corresponding structure on CT images.

investigate the use of deep learning in this context. Methods relating to coronary artery segmentation can be mainly divided into two categories: centerline extraction and lumen (i.e., vessel wall) segmentation.

#### **3.2.2.1. CNNs as a post-/pre-processing step**

Coronary centerline extraction is a challenging task due to the presence of nearby cardiac structures and coronary veins as well as motion artifacts in cardiac CT. Several deep learning approaches employ CNNs as either a post-processing or preprocessing step for traditional methods. For instance, Gülsün et al. (124) formulated centerline extraction as finding the maximum flow paths in a steady state porous media flow, with a learning-based classifier estimating anisotropic vessel orientation tensors for flow computation. A CNN classifier was then employed to distinguish true coronary centerlines from leaks into non-coronary structures. Guo et al. (125) proposed a multi-task FCN centerline extraction method that can generate a single-pixel-wide centerline, where the FCN simultaneously predicted centerline distance maps and endpoint confidence maps from coronary arteries and ascending aorta segmentation masks, which were then used as input to the subsequent minimal path extractor to obtain the final centerline extraction results. In contrast, unlike the aforementioned methods that used CNNs either as a pre-processing or post-processing step, Wolterink et al. (127) proposed to address centerline extraction via a 3D dilated CNN, where the CNN was trained on patches to directly determine a posterior probability distribution over a discrete set of possible directions as well as to estimate the radius of an artery at the given point.

#### **3.2.2.2. End-to-end CNNs**

With respect to the lumen or vessel wall segmentation, most deep learning based approaches use an end-to-end CNN segmentation scheme to predict dense segmentation probability maps (38, 122, 126, 205). In particular, Moeskops et al. (122) proposed a multi-task segmentation framework where a single CNN can be trained to perform three different tasks including coronary artery segmentation in cardiac CTA and tissue segmentation in brain MR images. They showed that such a multi-task segmentation network in multiple modalities can achieve equivalent performance as a single task network. Merkow et al. (38) introduced deep multi-scale supervision into a 3D U-net architecture, enabling efficient multi-scale feature learning and precise voxel-level predictions. Besides, shape priors can also be incorporated into the network (123, 206, 207). For instance, Lee et al. (123) explicitly enforced a roughly tubular shape prior for the vessel segments by introducing a template transformer network, through which a shape template can be deformed via network-based registration to produce an accurate segmentation of the input image, as well as to guarantee topological constraints. More recently, graph convolutional networks have also been investigated by Wolterink et al. (128) for coronary artery segmentation in CTA, where vertices on the coronary lumen surface mesh were considered as graph nodes and the locations of these tubular surface mesh vertices were directly optimized. They showed that such method significantly outperformed a baseline network that used only fully-connected layers on healthy subjects (mean Dice score: 0.75 vs. 0.67). Besides, the graph convolutional network used in their work is able to directly generate smooth surface meshes without postprocessing steps.

### 3.2.3. Coronary Artery Calcium and Plaque Segmentation

Coronary artery calcium (CAC) is a direct risk factor for cardiovascular disease. Clinically, CAC is quantified using the Agatston score (208) which considers the lesion area and the weighted maximum density of the lesion (209). Precise detection and segmentation of CAC are thus important for the accurate prediction of the Agatston score and disease diagnosis.

#### **3.2.3.1. Two-step segmentation**

One group of deep learning approaches to segmentation and automatic calcium scoring proposed to use a two-step segmentation scheme. For example, Wolterink et al. (129) attempted to classify CAC in cardiac CTA using a pair of CNNs, where the first CNN coarsely identified voxels likely to be CAC within a ROI detected using De et al. (203) and then the second CNN further distinguished between CAC and CAC-like negatives more accurately. Similar to such a twostage scheme, Lessmann et al. (130, 131) proposed to identify CAC in low-dose chest CT, in which a ROI of the heart or potential calcifications were first localized followed by a CAC classification process.

### **3.2.3.2. Direct segmentation**

More recently, several approaches (133–136) have been proposed for the direct segmentation of CAC from non-contrast cardiac CT or chest CT: the majority of them employed combinations of U-net (49) and DenseNet (78) for precise quantification of CAC which showed that a sensitivity over 90% can be achieved (133). These aforementioned approaches all follow the same workflow where the CAC is first identified and then quantified. An alternative approach is to circumvent the intermediate segmentation and to perform direct quantification, such as in de Vos et al. (209) and Cano-Espinosa et al. (210), which have proven that this approach is effective and promising.

Finally, for non-calcified plaque (NCP) and mixed-calcified plaque (MCP) in coronary arteries, only a limited number of works have been reported that investigate deep learning methods for segmentation and quantification (132, 211). Yet, this is a very important task from a clinical point of view, since these plaques can potentially rupture and obstruct an artery, causing ischemic events and severe cardiac damage. In contrast to CAC segmentation, NCP and MCP segmentation are more challenging due to their similar appearances and intensities as adjacent tissues. Therefore, robust and accurate analysis often requires the generation of multi-planar reformatted (MPR) images that have been straightened along the centerline of the vessel. Recently, Liu et al. (132) proposed a vessel-focused 3D convolutional network with attention layers to segment three types of plaques on the extracted and reformatted coronary MPR volumes. Zreik et al. (211) presented an automatic method for detection and characterization of coronary artery plaques as well as determination of coronary artery stenosis significance, in which a multi-task convolutional RNN was used to perform both plaque and stenosis classification by analyzing the features extracted along the coronary artery in an MPR image.

### 3.3. Cardiac Ultrasound Image Segmentation

Cardiac ultrasound imaging, also known as echocardiography, is an indispensable clinical tool for the assessment of cardiovascular function. It is often used clinically as the first imaging examination owing to its portability, low cost and real-time capability. While a number of traditional methods, such as active contours, level-sets and active shape models have been employed to automate the segmentation of anatomical structures in ultrasound images (212), the achieved accuracy is limited by various problems of ultrasound imaging, such as low signal-tonoise ratio, varying speckle noise, low image contrast (especially between the myocardium and the blood pool), edge dropout and shadows cast by structures, such as dense muscle and ribs.

As in cardiac MR and CT, several DL-based methods have been recently proposed to improve the performance of cardiac ultrasound image segmentation in terms of both accuracy and speed. The majority of these DL-based approaches focus on LV segmentation, with only few addressing the problem of aortic valve and LA segmentation. A summary of the reviewed works can be found in **Table 5**.

### 3.3.1. 2D LV Segmentation

### **3.3.1.1. Deep learning combined with deformable models**

The imaging quality of echocardiography makes voxel-wise tissue classification highly challenging. To address this challenge, deep learning has been combined with deformable model for LV segmentation in 2D images (138, 139, 141–145). Features extracted by trained deep neural networks were used instead of handcrafted features to improve accuracy and robustness.

Several works applied deep learning in a two-stage pipeline which first localizes the target ROI via rigid transformation of a bounding box, then segments the target structure within the ROI. This two-stage pipeline reduces the search region of the segmentation and increases robustness of the overall segmentation framework. Carneiro et al. (138, 139) first adopted this DL framework to segment the LV in apical long-axis echocardiograms. The method uses DBN (213) to predict the rigid transformation parameters for localization and the deformable model parameters for segmentation. The results demonstrated the robustness of DBN-based feature extraction to image appearance variations. Nascimento and Carneiro (140) further reduced the training and inference complexity of the DBN-based framework by using sparse manifold learning in the rigid detection step.

To further reduce the computational complexity, some works perform segmentation in one step without resorting to the twostage approach. Nascimento and Carneiro (141, 142) applied sparse manifold learning in segmentation, showing a reduced training and search complexity compared to their previous version of the method, while maintaining the same level of segmentation accuracy. Veni et al. (143) applied a FCN to produce coarse segmentation masks, which is then further refined by a level-set based method.

### **3.3.1.2. Utilizing temporal coherence**

Cardiac ultrasound data is often recorded as a temporal sequence of images. Several approaches aim to leverage the coherence between temporally close frames to improve the accuracy and robustness of the LV segmentation. Carneiro and Nascimento (144, 145) proposed a dynamic modeling method based on a sequential monte carlo (SMC) (or particle filtering) framework with a transition model, in which the segmentation of the current cardiac phase depends on previous phases. The results show that this approach performs better than the previous method (138) which does not take temporal information into account. In a more recent work, Jafari et al. (146) combined U-net, long-short term memory (LSTM) and inter-frame optical flow to utilize multiple frames for segmenting one target frame, demonstrating improvement in overall segmentation accuracy. The method was also shown to be more robust to image quality variations in a sequence than single-frame U-net.

### **3.3.1.3. Utilizing unlabeled data**

Several works proposed to use non-DL based segmentation algorithms to help generating labels on unlabeled images, effectively increasing the amount of training data. To achieve this, Carneiro and Nascimento (147, 148) proposed on-line retraining strategies where segmentation network (DBN) is firstly initialized TABLE 5 | A summary of reviewed deep learning methods for ultrasound image segmentation.


A[X]C is short for Apical [X]-chamber view. PLAX/PSAX, parasternal long-axis/short-axis; CETUS, using the dataset from Challenge on Endocardial Three-dimensional Ultrasound Segmentation.

using a small set of labeled data and then applied to nonlabeled data to propose annotations. The proposed annotations are then checked by external classifiers before being used to retrain the network. Smistad et al. (149) trained a U-net using images annotated by a Kalman filtering based method (214) and illustrated the potential of using this strategy for pre-training. Alternatively, some works proposed to exploit unlabeled data without using additional segmentation algorithm. Yu et al. (150) proposed to train a CNN on a partially labeled dataset of multiple sequences, then fine-tuned the network for each individual sequence using manual segmentation of the first frame as well as CNN-produced label of other frames. Jafari et al. (151) proposed a semi-supervised framework which enables training on both the labeled and unlabeled images. The framework uses an additional generative network, which is trained to generate ultrasound images from segmentation masks, as additional supervision for the unlabeled frames in the sequences. The generative network forces the segmentation network to predict segmentation that can be used to successfully generate the input ultrasound image.

#### **3.3.1.4. Utilizing data from multiple domains**

Apart from exploiting unlabeled data in the same domain, leveraging manually annotated data from multiple domains (e.g., different 2D ultrasound views with various anatomical structures) can also help to improve the segmentation in one particular domain. Chen et al. (153) proposed a novel FCN-based network to utilize multi-domain data to learn generic feature representations. Combined with an iterative refinement scheme, the method has shown superior performance in detection and segmentation over traditional database-guided method (215), FCN trained on single-domain and other multi-domain training strategies.

#### **3.3.1.5. Others**

The potential of CNN in segmentation has motivated the collection and labeling of large-scale datasets. Several methods have since shown that deep learning methods, most notably CNN-based methods, are capable of performing accurate segmentation directly without complex modeling and postprocessing. Leclerc et al. (155) performed a study to investigate the effect of the size of annotated data for the segmentation of the LV in 2D ultrasound images using a simple U-net. The authors demonstrated that the U-net approach significantly benefits from larger amounts of training data. In addition to performance on accuracy, some work investigated the computational efficiency of DL-based methods. Smistad et al. (154) demonstrated the efficiency of CNN-based methods by successfully performing real-time view-classification and segmentation. Jafari et al. (156) developed a software pipeline capable of real-time automated LV segmentation, landmark detection and LV ejection fraction calculation on a mobile device taking input from point-of-care ultrasound (POCUS) devices. The software uses a lightweight Unet trained using multi-task learning and adversarial training, which achieves EF prediction error that is lower than inter- and intra- observer variability.

#### 3.3.2. 3D LV Segmentation

Segmenting cardiac structures in 3D ultrasound is even more challenging than 2D. While having the potential to derive more accurate volume-related clinical indices, 3D echocardiograms suffer from lower temporal resolution and lower image quality compared to 2D echocardiograms. Moreover, 3D images dramatically increase the dimension of parameter space of neural networks, which poses computational challenges for deep learning methods.

One way to reduce the computational cost is to avoid direct processing of 3D data in deep learning networks. Dong et al. (157) proposed a two-stage method by first applying a 2D CNN to produce coarse segmentation maps on 2D slices from a 3D volume. The coarse 2D segmentation maps are used to initialize a 3D shape model which is then refined by 3D deformable model method (216). In addition, the authors used transfer learning to side-step the limited training data problem by pre-training network on a large natural image segmentation dataset and then fine-tuning to the LV segmentation task.

Anatomical shape priors have been utilized to increase the robustness of deep learning-based segmentation methods to challenging 3D ultrasound images. Oktay et al. (59) proposed an anatomically constrained network where a shape constraint-based loss is introduced to train a 3D segmentation network. The shape constraint is based on the shape prior learned from segmentation maps using auto-encoders (152). Dong et al. (158) utilized shape prior more explicitly by combining a neural network with a conventional atlas-based segmentation framework. Adversarial training was also applied to encourage the method to produce more anatomically plausible segmentation maps, which contributes to its superior segmentation performance comparing to a standard voxel-wise classification 3D segmentation network (52).

#### 3.3.3. Left Atrium Segmentation

Degel et al. (160) adopted the aforementioned anatomical constraints in 3D LA segmentation to tackle the domain shift problem caused by variation of imaging device, protocol and patient condition. In addition to the anatomically constraining network, the authors applied an adversarial training scheme (217) to improve the generalizability of the model to unseen domain.

#### 3.3.4. Multi-Chamber Segmentation

Apart from LV segmentation, a few works (23, 42, 149) applied deep learning methods to perform multi-chamber (including LV and LA) segmentation. In particular, (42) demonstrated the applicability of CNNs on three tasks: view classification, multi-chamber segmentation and detection of cardiovascular diseases. Comprehensive validation on a large (non-public) clinical dataset showed that clinical metrics derived from automatic segmentation are comparable or superior than manual segmentation. To resemble real clinical situations and thus encourages the development and evaluation of robust and clinically effective segmentation methods, a large-scale dataset for 2D cardiac ultrasound has been recently made public (23). The dataset and evaluation platform were released following the preliminary data requirement investigation of deep learning methods (155). The dataset is composed of apical 4-chamber view images annotated for LV and LA segmentation, with uneven imaging quality from 500 patients with varying conditions. Notably, the initial benchmarking (23) on this dataset has shown that modern encoder-decoder CNNs resulted in lower error than inter-observer error between human cardiologists.

### 3.3.5. Aortic Valve Segmentation

Ghesu et al. (159) proposed a framework based on marginal space learning (MSL), Deep neural networks (DNNs) and active shape model (ASM) to segment the aortic valve in 3D cardiac ultrasound volumes. An adaptive sparsely-connected neural network with reduced number of parameters is used to predict a bounding box to locate the target structure, where the learning of the bounding box parameters is marginalized into sub-spaces to reduce computational complexity. This framework showed significant improvement over the previous non-DL MSL (218) method while achieving competitive run-time.

### 3.4. Discussion

So far, we have presented and discussed recent progress of deep learning-based segmentation methods in the three modalities (i.e., MR, CT, ultrasound) that are commonly used in the assessment of cardiovascular disease. To summarize, current state-of-the-art segmentation methods are mainly based on CNNs that employ the FCN or U-net architecture. In addition, there are several commonalities in the FCN-based methods for cardiac segmentation which can be categorized into four groups: (1) enhancing network feature learning by employing advanced building blocks in networks (e.g., inception module, dilated convolutions), most of which have been mentioned earlier (section 2.1.6); (2) alleviating the problem of class imbalance with advanced loss functions (e.g., weighted loss functions); (3) improving the networks' generalization ability and robustness through a multi-stage pipeline, multi-task learning, or multiview feature fusion; (4) forcing the network to generate more anatomically-plausible segmentation results by incorporating shape priors, applying adversarial loss or anatomical constraints to regularize the network during training. It is also worthwhile to highlight that for cardiac image sequence segmentation (e.g., cine MR images, 2D ultrasound sequences), leveraging spatial and temporal coherence from these sequences with advanced neural networks [e.g., RNN (32, 146), multi-slice FCN (27)] has been explored and shown to be beneficial for improving the segmentation accuracy and temporal consistency of the segmentation maps.

While the results reported in the literature show that neural networks have become more sophisticated and powerful, it is also clear that performance has improved with the increase of publicly available training subjects. A number of DL-based methods (especially in MRI) have been trained and tested on public challenge datasets, which not only provide large amounts of data to exploit the capabilities of deep learning in this domain, but also a platform for transparent evaluation and comparison. In addition, many of the participants in these challenges have shared their code with other researchers via open-source community websites (e.g., Github). Transparent and fair benchmarking and sharing of code are both essential for continued progress in this domain. We summarize the existing public datasets in **Table 6** and public code repositories in **Table 7** for reference.

An interesting conclusion supported by **Table 7** is that the target image type can affect the choice of network structures (i.e., 2D networks, 3D networks). For 3D imaging acquisitions, such as LGE-MRI and CT images, 3D networks are preferred whereas 2D networks are more popular approaches for segmenting cardiac cine short-axis or long-axis image stacks. One reason for using 2D networks for the segmentation of short-axis or long-axis images is their typically large slice thickness (usually around 7– 8 mm) which can further exacerbated by inter-slice gaps. In addition, breath-hold related motion artifacts between different slices may negatively affect 3D networks. A study conducted by Baumgartner et al. (25) has shown that a 3D U-net performs worse than a 2D U-net when evaluated on the ACDC challenge dataset. By contrast, in the LASC'18 challenge mentioned in **Table 6**, which uses high-resolution 3D images, most participants applied 3D networks and the best performance was achieved by a cascaded network based on the 3D U-net (54).

It is well-known that training 3D networks is more difficult than training 2D networks. In general, 3D networks have significantly more parameters than 2D networks. Therefore, 3D networks are more difficult and computationally expensive to optimize as well as prone to over-fitting, especially if the training data is limited. As a result, several researchers have tried to carefully design the structure of network to reduce the number of parameters for a particular application and have also applied advanced techniques (e.g., deep supervision) to alleviate the overfitting problem (30, 54). For this reason, 2D-based networks (e.g., 2D U-net) are still the most popular segmentation approaches for all three modalities.

In addition to 2D and 3D networks, several authors have proposed "2D+" networks that have been shown to be effective in segmenting structures from cardiac volumetric data. These "2D+" networks are mainly based on 2D networks, but are adapted with increased capacity to utilize 3D context. These networks include multi-view networks which leverage multiplanar information (i.e., coronal, sagittal, axial views) (99, 117), multi-slice networks, and 2D FCNs combined with RNNs which incorporate context across multiple slices (33, 55, 92, 169). These "2D+" networks inherit the advantages of 2D networks while still being capable of leveraging through-plane spatial context for more robust segmentation with strong 3D consistency.

Finally, it is worth to note that there is no universally optimal segmentation method. Different applications have different complexities and different requirements, meaning that customized algorithms need to be optimized. For example, while anatomical shape constraints can be applied to cardiac anatomical structure segmentation (e.g., ventricle segmentation) to boost the segmentation performance, those constraints may not be suitable for the segmentation of pathologies or lesions (e.g., scar segmentation) which can have arbitrary shapes. Also, even if the target structure in two applications are the same, the complexity of the segmentation task can vary significantly from one to another, especially when their underlying imaging modalities and patient populations are different. For example, directly segmenting the left ventricle myocardium from contrastenhanced MR images (e.g., LGE images) is often more difficult than from MR images without contrast agents, as the anatomical structures are more attenuated by the contrast agent. For cases with certain diseases (e.g., myocardial infarction), the border between the infarcted region and blood pool appears blurry and ambiguous to delineate. As a result, a segmentation network designed for non-contrast enhanced images may not be directly applied to contrast-enhanced images (100). A more sophisticated algorithm is generally required to assist the segmentation procedure. Potential solutions include applying dedicated image pre-processing, enhancing network capacity, adding shape constraints, and integrating specific knowledge about the application.

### 4. CHALLENGES AND FUTURE WORK

It is evident from the literature that deep learning methods have matched or surpassed the previous state of the art in various cardiac segmentation applications, mainly benefiting from the increased size of public datasets and the emergence of advanced network architectures as well as powerful hardware for computing. Given this rapid process, one may wonder if deep learning methods can be directly deployed to real-world TABLE 6 | Summary of public datasets on cardiac segmentation for the three modalities.


Most of the datasets listed above are from the MICCAI society.

applications to reduce the workload of clinicians. The current literature suggests that there is still a long way to go. In the following paragraphs, we summarize several major challenges in the field of cardiac segmentation and some recently proposed approaches that attempt to address them. These challenges and related works also provide potential research directions for future work in this field.

### 4.1. Scarcity of Labels

One of the biggest challenges for deep learning approaches is the scarcity of annotated data. In this review, we found that the majority of studies uses a fully supervised approach to train their networks, which requires a large number of annotated images. In fact, annotating cardiac images is time consuming and often requires significant amounts of expertise. These methods can be divided into five classes: data augmentation, transfer learning with fine-tuning, weakly and semi-supervised learning, self-supervised learning, and unsupervised learning.

• **Data augmentation**. Data augmentation aims to increase the size and the variety of training images by artificially generating new samples from existing labeled data. Traditionally, this can be achieved by applying a stack of geometric or photometric transformations to existing image-label pairs. These transformations can be affine transformations, adding random noise to the original data, or adjusting image contrast. However, designing an effective pipeline of data augmentation often requires domain knowledge, which may not be easily extendable to different applications. And the diversity of augmented data may still be limited, failing to reflect the spectrum of real-world data distributions. Most recently, several researchers have began to investigate the use of generative models [e.g., GANs, variational AE (219)], reinforcement learning (220), and adversarial example generation (221) to directly learn task-specific augmentation strategies from existing data. In particular, the generative model-based approach has been proven to be effective for one-shot brain segmentation (222) and few-shot cardiac MR image segmentation (223) and it is thus worth exploring for more applications in the future.



SAX, short-axis view; WHS, whole heart segmentation.

or bounding boxes). In this context, several works have been proposed for cardiac ventricle segmentation in MR images. One approach is to estimate full labels on unlabeled or weakly labeled images for further training. For example, Qin et al. (28) and Bai et al. (32) utilized motion information to propagate labels from labeled frames to unlabeled frames in a cardiac cycle whereas (224, 225) applied the expectation maximization (EM) algorithm to predict and refine the estimated labels recursively. Others have explored different approaches to regularize the network when training on unlabeled images, applying multi-task learning (177, 178), or global constraints (226).


In general, transfer learning and self-supervised learning allow the network to be aware of general knowledge shared across different tasks to accelerate learning procedure and to encourage model generalization. On the other hand, data augmentation, weakly and semi-supervised learning allows the network to get more labeled training data in an efficient way. In practice, the two types of methods can be integrated together to improve the model performance. For example, transfer learning can be applied at the model initialization stage whereas data augmentation can be applied at the model fine-tuning stage.

### 4.2. Model Generalization Across Various Imaging Modalities, Scanners, and Pathologies

Another common limitation in DL-based methods is that they still lack generalization capabilities when presented with previously unseen samples (e.g., data from a new scanner, abnormal, and pathological cases that have not been included in the training set). In other words, deep learning models tend to be biased by their respective training datasets. This limitation prevents models to be deployed in the real world and therefore diminishes their impact for improving clinical workflows.

To improve the model performance across MR images acquired from multiple vendors and multiple scanners (53), collected a large multi-vendor, multi-center, heterogeneous labeled training set from patients with cardiovascular diseases. However, this approach may not scale to the real world, as it implies the collection and labeling of a vastly large dataset covering all possible cases. Several researchers have recently started to investigate the use of unsupervised domain adaptation techniques that aim at optimizing the model performance on a target dataset without additional labeling costs. Several works have successfully applied adversarial training to cross-modality segmentation tasks, adapting a cardiac segmentation model learned from MR images to CT images and vice versa (39– 41, 228, 229). These type of approaches can also be adopted for semi-supervised learning, where the target domain is a new set of unlabeled data of the same modality (230). Of note, these domain adaptation methods often require the access to unlabeled images in the target domain (e.g., a new scanner, a different hospital), which may not be easy to obtain due to the data privacy and ethics issues. How to collect and share data safely, fairly, and legally across different sites is still an open challenge.

On the other hand, some researchers have started to develop domain generalization algorithms, without requiring accessing images from new sites. One stream of works aims to improve the domain generalization ability by extracting domain-independent and robust features or disentangling learned features into domain-specific and domain-invariant components from various seen domains (e.g., multi-center data, multi-modality datasets) to improve the model performance on unseen domains (221, 228, 231). Other researchers have started to adopt data augmentation techniques to simulate various possible data distributions across different domains. For instance, Chen et al. (232) have proposed a data normalization and augmentation pipeline which enables a neural network for cardiac MR image segmentation trained from a single-scanner dataset to generalize well across multi-scanner and multi-site datasets. Zhang et al. (233) applied a similar data augmentation approach to improve the model generalization ability on unseen datasets. Their method has been verified on three tasks including left atrial segmentation from 3D MRI and left ventricle segmentation from 3D ultrasound images.

One bottleneck of augmenting training data for model generalization across different sites is that it is often required to increase the model capacity to compensate for the increased dataset size and variation (232). As a result, training becomes more expensive and challenging. To address this inefficiency problem, active learning (234) has been proposed, which selects the most representative images from a large-scale dataset, reducing labeling workload as well as computational costs. This technique is also related to incremental learning, which aims to improve the model performance by adding new classes incrementally while avoiding a dramatic decrease in overall performance (235). Given the increasing size of the available medical imaging datasets and the practical challenges of collecting, labeling and storing large amounts of images from various sources, it is of great interest to combine domain generalization algorithms with active learning algorithms together to distill a large dataset into a small one but containing the most representative cases for effective and robust learning.

## 4.3. Lack of Model Interpretability

Unlike symbolic artificial intelligence systems, deep learning systems are difficult to interpret and not transparent. Once a network has been trained, it behaves like a "black box," providing predictions which are not directly interpretable. This issue makes the model unpredictable, intractable for model verification, and ultimately untrustworthy. Recent studies have shown that deep learning-based vision recognition systems can be attacked by images modified with nearly imperceptible perturbations (236– 238). These attacks can also happen in medical scenarios, e.g., a DL-based system may make a wrong diagnosis given an image with adversarial noise or even just small rotation, as demonstrated in a very recent paper (239). Although there is no denying that deep learning has become a very powerful tool for image analysis, building resilient algorithms robust to potential attacks remains an unsolved problem. One potential solution, instead of building the resilience into the model, is raising failure awareness of the deployed networks. This can be achieved by providing users with segmentation quality scores (240) or confidence maps, such as uncertainty maps (166) and attention maps (241). These scores or maps can be used as evidence to alert users when failure happens. For example, Sander et al. (166) built a network that is able to simultaneously predict the segmentation mask over cardiac structures and its associated spatial uncertainty map, where the latter one could be used to highlight potential incorrect regions. Such uncertainty information could alert human experts for further justification and refinement in a human-in-the-loop setting.

## 4.4. Future Work

### 4.4.1. Smart Imaging

We have shown that deep learning-based methods are able to segment images in real-time with good accuracy. However, these algorithms can still fail on those image acquisitions with low image quality or significant artifacts. Although there have been several algorithms developed to avoid this problem by either checking the image quality before followup studies (242, 243), or predicting the segmentation quality to detect failures (240, 244, 245), the development of algorithms that can give instant feedback to correct and optimize the image acquisition process is also important despite less explored. Improving the imaging quality can greatly improve the effectiveness of medical imaging as well as the accuracy of imaging-based diagnosis. For radiologists, however, finding the optimal imaging and reconstruction parameters to scan each patient can take a great amount of time. Therefore, a DL-based system that has the potential of efficiently and effectively improving the image quality with less noise is of great need. Some researchers have utilized learning-based methods (mostly are deep learning-based) for better image resolution (62), view planning (246), motion correction (247, 248), artifacts reduction (249), shadow detection (250), and noise reduction (251) after image acquisition. However, combining these algorithms with segmentation algorithms and seamlessly integrating them into an efficient, patient-specific imaging system for high-quality image analysis and diagnosis is still an open challenge. An alternative approach is to directly predict cardiac segmentation maps from undersampled k-space data to accelerate the whole procedure, which bypasses the image reconstruction stage (58).

#### 4.4.2. Data Harmonization

A number of works have reported the existence of missing labels and inconsistent labeling protocols among different cardiac image datasets (27, 232). Variations have been found in defining the end of basal slices as well as the endocardial wall of myocardium (some include papillary muscles as part of the endocardial contours whereas others do not). These inconsistencies can be a major obstacle for transferring, evaluating and deploying deep learning models trained from one domain (e.g., hospital) to another. Therefore, building a standard benchmark dataset like CheXpert (252) that (1) is large enough to have substantial data diversity that reflects the spectrum of real-world diversity; (2) has a standard labeling protocol approved by experts, is indeed a need. However, directly building such a dataset from scratch is time-consuming and expensive. A more promising way might be developing an automated tool to combine existing public datasets from multiple sources and then to harmonize them to a unified, high-quality dataset. This tool can not only open the door for crowdsourcing but also enable the rapid deployment of those DL-based segmentation models.

#### 4.4.3. Data Privacy

As deep learning is a data-driven approach, an unavoidable and rife concern is about the data privacy. Regulations, such as The General Data Protection Regulation (GDPR) now play an important role to protect users' privacy and have forced organizations to treat data ownership seriously. On the other hand, from a technical point of view, how to store, query, and process data such that there is no privacy concerns for building deep learning systems has now become an even more difficult but interesting challenge. Building a privacy-preserving algorithm requires to combine cryptography and deep learning together and to mix techniques from a wide range of subjects, such as data analysis, distributed computing, federated learning, differential privacy, in order to achieve models with strong security, fast run time, and great generalizability (253–256). In this respect, Papernot (257) published a report for guidance, which summarized a set of best practices for improving the privacy and security of machine learning systems. Yet, this field is still in its infancy.

### 5. CONCLUSION

In this review paper, we provided a comprehensive overview of these deep learning techniques used in three common imaging modalities (MRI, CT, ultrasound), covering a wide range of existing deep learning approaches (mostly are CNN-based) that are designed for segmenting different cardiac anatomical structures (e.g., cardiac ventricle, atria, vessel). In particular, we presented and discussed recent progress of deep learningbased segmentation methods in the three modalities, outlined future potential and the remaining limitations of these deep learning-based cardiac segmentation methods that may hinder widespread clinical deployment. We hope that this review can provide an intuitive understanding of those deep learning-based techniques that have made a significant contribution to cardiac image segmentation and also increase the awareness of common challenges in this field that call for future contribution.

### 6. DATA AVAILABILITY STATEMENT

The datasets summarized in **Table 6** can be found in their corresponding websites listed below:














### AUTHOR CONTRIBUTIONS

CC, WB, and DR conceived and designed the work. CC, CQ, and HQ searched and read the MR, CT, Ultrasound literature, respectively, and drafted the manuscript together. WB, DR, GT, and JD provided the critical revision with insightful and constructive comments to improve the manuscript. All authors read and approved the manuscript.

### FUNDING

This work was supported by the SmartHeart EPSRC Programme Grant (EP/P001009/1). HQ was supported by the EPSRC Programme Grant (EP/R005982/1).

### ACKNOWLEDGMENTS

We would like to thank our colleagues: Karl Hahn, Qingjie Meng, James Batten, and Jonathan Passerat-Palmbach

### REFERENCES


who provided the insight and expertise that greatly assisted the work, and also constructive and thoughtful comments from Turkay Kart that greatly improved the manuscript.


image segmentation. In: The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019. Honolulu, HI: AAAI Press (2019). p. 865–72.


and Modelling of the Heart. Vol. 10263 LNCS of Lecture Notes in Computer Science. Cham: Springer International Publishing (2017). p. 127–38.


Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Springer International Publishing (2018). p. 29–37.


MICCAI 2018, Statistical Atlases and Computational Models of the Heart. Atrial Segmentation and LV Quantification Challenges. Granada: Springer (2018). p. 221–9.


International Conference on Medical Image Computing and Computer Assisted Intervention - MICCAI 2018. Granada: Springer International Publishing (2018). p. 580–8.


motion correction for short-axis cine cardiac MR image stacks. In: Frangi AF, Schnabel JA, Davatzikos C, Alberola-López C, Fichtinger G, editors. 21st International Conference on Medical Image Computing and Computer Assisted Intervention—MICCAI 2018. Granada (2018). p. 268–76.


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Chen, Qin, Qiu, Tarroni, Duan, Bai and Rueckert. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# AI in Cardiac Imaging: A UK-Based Perspective on Addressing the Ethical, Social, and Political Challenges

#### Matthew E. Fenech\* and Olly Buston

*Future Advocacy, London, United Kingdom*

Imaging and cardiology are the healthcare domains which have seen the greatest number of FDA approvals for novel data-driven technologies, such as artificial intelligence, in recent years. The increasing use of such data-driven technologies in healthcare is presenting a series of important challenges to healthcare practitioners, policymakers, and patients. In this paper, we review ten ethical, social, and political challenges raised by these technologies. These range from relatively pragmatic concerns about data acquisition to potentially more abstract issues around how these technologies will impact the relationships between practitioners and their patients, and between healthcare providers themselves. We describe what is being done in the United Kingdom to identify the principles that should guide AI development for health applications, as well as more recent efforts to convert adherence to these principles into more practical policy. We also consider the approaches being taken by healthcare organizations and regulators in the European Union, the United States, and other countries. Finally, we discuss ways by which researchers and frontline clinicians, in cardiac imaging and more broadly, can ensure that these technologies are acceptable to their patients.

#### Edited by:

*Tim Leiner, University Medical Center Utrecht, Netherlands*

#### Reviewed by:

*Fabien Hyafil, Assistance Publique Hopitaux De Paris, France Steffen Erhard Petersen, Queen Mary University of London, United Kingdom*

#### \*Correspondence:

*Matthew E. Fenech matthew.fenech@ada.com*

#### Specialty section:

*This article was submitted to Cardiovascular Imaging, a section of the journal Frontiers in Cardiovascular Medicine*

> Received: *31 October 2019* Accepted: *20 March 2020* Published: *15 April 2020*

#### Citation:

*Fenech ME and Buston O (2020) AI in Cardiac Imaging: A UK-Based Perspective on Addressing the Ethical, Social, and Political Challenges. Front. Cardiovasc. Med. 7:54. doi: 10.3389/fcvm.2020.00054*

Keywords: artificial intelligence, ethics, policy, principles, regulation

## INTRODUCTION

Technological change is certainly not a new phenomenon. 3.3 million-year old stone tools made by Australopithecus, one of the earliest hominid species, have been found in Kenya (1), indicating that the drive to use tools to make tasks easier, and hence to improve quality of life, has not changed over the millions of years of humanity's history. One thing that has certainly changed over this time period, however, is the increased speed at which technological progress occurs. Gordon Moore's famous prediction that the number of transistors per square inch on an integrated circuit would double every one to two years (2) has stood the test of time, and other metrics of technological advancement, such as data storage per unit cost, speed of DNA sequencing, and internet bandwidth, have also increased at exponential rates over the last few decades (3).

With new technologies come new potential socioeconomic impacts, and new reactions to these real and imagined impacts by governments and international bodies. Once again, the impulse to regulate novel technologies is long-standing—the history of everything from the railways, to the automobile, to mining, to in vitro fertilization provides fascinating case studies in how societies of the day reacted to unfamiliar technology. Nevertheless, it is arguable that artificial intelligence (AI) is unlike other technologies in that never before has there been such a general-purpose technology that makes us question what it means to be human (4–6). Although there is no sign of anything approaching artificial general intelligence (AGI), the very fact that AI poses such deeply existential questions is just one of the challenges that it poses to regulators and policymakers, particularly in the hugely sensitive area of healthcare. In this paper, we discuss the various ethical, social, and political challenges the application of AI to health and care presents, and how reactions to these challenges are being used to develop principles for action. In some jurisdictions, these principles are being translated into policy and regulation, clearly setting out what should and should not be allowed. Moreover, we outline what researchers and clinicians can do to help ensure that the use of these technologies is acceptable to patients and practitioners alike.

### ETHICAL, SOCIAL, AND POLITICAL CHALLENGES

Future Advocacy, an independent think tank focused on policy development around the responsible use of emerging technology, conducted a series of interviews with expert clinicians, technologists, and ethicists, as well as focus groups with patients, and identified ten sets of questions that are raised by the application of AI to the health setting (**Table 1**) (7). In the following sections, we briefly discuss each in turn, and reflect on any advances in thinking and practice that have occurred since the publication of our original report.

### Relationships

Healthcare is built on a complex network of relationships between various stakeholders. The primacy of the relationship between patients and their healthcare professionals (HCPs) is clear from the value still placed on it, even in the context of medical practice that is increasingly characterized by the use

TABLE 1 | Ten major ethical, social, and political challenges of the use of artificial intelligence technologies in health and care.


of technology (5, 7). It is however but one of the relationships in healthcare—others include those between the HCP and caregivers/relatives; between different HCPs; between top-level administrators and HCPs "on the ground"; and between patients and wider society (8). All of these relationships could be impacted by the introduction of an AI algorithm, and the inferences or predictions it provides, as a "third party" in what were previously two-way interactions. What will patients do, for example, when faced with the scenario of their doctor's recommendation clashing with the suggestion for treatment provided by an AI tool? How will patients react to an error in their care that is traced back to a decision made, or supported, by AI? The specific issue of liability for error is discussed in section 2.5 below, and a Royal Society-commissioned study found that many members of the public were optimistic about the possibility for AI to reduce medical error (5), but there is a need for more research aimed at understanding how patients are likely to respond to such AI-derived errors.

Another way by which AI may impact relationships in healthcare is through its potential to fundamentally change the role of doctors and other HCPs. To paraphrase Mark Twain, reports of the death of the radiologist are greatly exaggerated (9). Nevertheless, as AI tools become better at performing certain circumscribed tasks in healthcare, such as image recognition, the repertoire of tasks that make up a HCP's job will change. Some have expressed their hope that the "delegation" of such tasks to algorithms will free up more time for HCPs to spend with patients and their relatives (10), but previous experience of the introduction of different technologies into the clinical space suggests that they may well increase clinician workload in both primary and secondary care (11–13). Whether AI is different remains unknown; various medical bodies are grappling with this question (14, 15), and at the time of writing, Health Education England (the body in England responsible for postgraduate training and development of NHS England's workforce) was holding a consultation on the topic of the "Future Doctor" (16).

### Data

AI is increasingly being used to identify patterns in and extract value from the vast amounts of data being generated by individuals, governments, and companies. Healthcare is no exception—the volume, complexity and longevity of healthcare data are all rising fast, with some estimates predicting that the total amount of healthcare data will reach 2.3 billion gigabytes by next year (17). With larger volumes and greater complexity come new questions about the implications of such data use and storage. Firstly, there is the pragmatic concern of how informed consent—the bedrock of interactions between patients and healthcare systems since at least the nineteenth century (18)—is obtained from each and every contributor to a dataset, which may number in the millions. Similarly, as the technology is developing so rapidly, new insights are derived from existing datasets that could not have been predicted before data analysis, as evidenced by the Google/Verily Life Science deep learning algorithm that can determine gender from retinal photographs (19). How do we obtain informed consent for future, unimagined uses of data? The European Union (EU) General Data Protection Regulation (GDPR) already makes it clear that there are multiple "lawful bases" for data processing, and informed consent is only one of them (20). Clearly, the field of health and care needs to determine whether alternatives to individualized informed consent (including broad consent, "opt-out" consent, and presumed consent) are acceptable in the context of AI research and development, whilst maintaining their patients' and research subjects' trust (21). GDPR also sets clear restrictions on how identifiable information about particular data subjects can or cannot be shared, and these are particularly relevant in the age of establishment and curation of "big data" healthcare datasets. Patients and research subjects are right to expect that their data, donated in good faith for use in research, does not end up being used to determine health insurance premiums, for example (22). Although the regulation exists, this is only as good as its enforcement, and concerns about the rigor with which GDPR is being enforced have been raised in other sectors (23). Ultimately, when it comes to such sensitive subjects, a reliance on regulation alone is not sufficient; this must be backed up by education of, and dialogue between, all stakeholders, focusing on their data rights and responsibilities in law.

A consideration that is perhaps particularly relevant to radiologists was highlighted in the Joint Statement on "Ethics of Artificial Intelligence in Radiology," issued by the American College of Radiologists, European Society of Radiology, Radiological Society of North America, Society for Imaging Informatics in Medicine, European Society of Medical Imaging Informatics, Canadian Association of Radiologists, and American Association of Physicists in Medicine (24). Radiologists are in great demand to provide accurate and replicable labeling of radiological images, which are then used in supervised learning, for example in training convolutional neural networks. Those with expertise in cardiac imaging will be particularly sought after, given the especially timeconsuming and resource-intensive nature of interpreting cardiac imaging modalities such cardiac magnetic resonance (25), and any difficulty in recruiting such experts may well slow the development of these tools in this area of radiology. As any practicing clinician knows, labeling and classification of realworld clinical imaging is similar to all medical decision-making in that it involves many assumptions, heuristics, and potential biases (26–28). When processing data for use in AI training, radiologists need to be aware of these biases, to avoid introducing additional bias into imperfect datasets, as well as recognizing the various incentives and pressures that may influence their decision-making, including commercial pressures to provide these data (24, 29, 30). Radiology training programmes will need to be updated to make sure the radiologists of the future are best prepared to spot and mitigate these problems (31).

Perhaps of all the challenges discussed in this review, those surrounding data are the ones best addressed by existing regulation, with the Privacy Rule created under the Health Insurance Portability and Accountability Act (HIPAA) wellestablished in the United States, and GDPR incentivizing businesses and public bodies to give their European clients greater control of their data (32). Nevertheless, gaps remain. HIPAA's Privacy Rule, for example, does not cover nonhealth information from which health-related conclusions can be drawn, or user-generated health information (33) such omissions cannot be tolerated for long in an age of linked datasets and wearable technologies constantly monitoring parameters such as heart rate. Moreover, although Article 22 on automated decision-making is clearly relevant, the words "artificial intelligence" do not appear in the text of the GDPR once, as the regulation is relatively agnostic about the downstream use of the data. This is in contrast to, for example, the guidance on the regulation of data-driven technology published by the German Government's Data Ethics Commission, which explicitly draws links between data ethics and algorithmic ethics (34). European policy experts have reason to believe that this document will prove influential as the European Commission (EC) develops widely-expected regulation on AI in 2020 (35). The framework for such regulation was laid out in the EC's White Paper on AI, published in February, which is now open to public consultation (36).

### Transparency and Explainability

The "black box" problem is one of the major foci of AI ethics (37). Besides referring to the inherent opacity of complex machine learning algorithms such as neural networks, it is also the case that the increasing size of datasets used in developing AI for health makes explanations of the relationships between input data and outputs difficult—understanding how each of millions of variables contributes to the final output may be computationally intractable (38). Questions that may therefore follow include: How can patients give meaningful informed consent to, or clinicians advise the use of, algorithms the internal workings of which are unclear? (39) Should we be using black box algorithms in healthcare at all?

It is easy to forget that the human brain is itself a "black box," given the ease with which we explain our own decisions via post hoc rationalization (40, 41). The field of medicine has therefore been accustomed to dealing with black box decisionmaking for millennia. Of course, part of the difference between an opaque human decision and an opaque algorithmic one is the ability to have a conversation with the former, such that the decision itself can be probed and aspects of the decision that are important to its subject better understood. This highlights an important concept that should be considered when grappling with the issue of explainability, which is the distinction between "model-centric explanations" (where the focus is on providing a complete account of how the model works), and "subject-centric explanations" (where "only" those aspects of model functioning that are relevant to the subject are considered) (42). Given that different subjects may require different types of explanation, there is a very strong argument for addressing the black box problem through thorough user/stakeholder research, and their meaningful involvement throughout the development process. Thus, rather than a blanket requirement of full explainability, smart regulatory frameworks may opt to give regard to the application of the AI tool, its intended target group, and its risk profile, with higher risk applications in more vulnerable groups necessitating deeper explanations. Nevertheless, we contend that

one area of transparency should remain strictly enforced, namely that developers and healthcare system administrators make it absolutely clear to service users when an algorithm is being used to support or to independently provide decision-making.

### Health Inequalities

A systematic review found significant aversion amongst the UK public to health inequalities, particularly when such inequalities are presented in the context of socioeconomic differences (43). Thus, any suggestion that the use of AI in medicine may exacerbate existing health inequalities, for example by automating existing bias and unfairness at speed and scale, is likely to decrease trust in and acceptability of these tools. Sadly, there is evidence that this is already occurring. For example, an algorithm in widespread use in the US to determine the likely healthcare needs of a patient, and thus access to onward services, exhibits significant racial bias—in short, African-American patients needed to be significantly "sicker" than Caucasian patients to get the same score, and thus the same access to services, via this algorithm (44).

In the context of cardiac imaging, a specific source of inequality may result from the geographic distribution of these technologies. Much cardiac imaging, particularly using newer modalities such as cardiac magnetic resonance, is largely performed in higher-income countries, and even there, in centers of excellence or high-volume practices (45). This means that training datasets used in the development of AI models for the analysis of these images will suffer from a relative lack of images from patients in low- and middle-income countries. Even disadvantaged patients in high-income countries, who may not have access to the best, most expensive imaging, may be relatively underrepresented in such datasets. Such excluded groups may find that cardiac imaging AI tools developed with these unrepresentative datasets are either less accurate when applied to their cases, or are excluded altogether from the potential benefits of these technologies due to decisions around deployment and marketing by their manufacturers.

The question also remains as to whether the use of AI will create new health inequalities. For example, consumer-facing AI tools presuppose some degree of digital literacy, and their use is likely to pose a personal financial cost to an individual, given the expensive hardware that is frequently required, such as a smartphone or wearable technology. More work is needed to better understand which groups may be excluded from the benefits these technologies could bring, and to develop strategies to avoid such outcomes.

### Errors and Liability

Just like the black box problem, the question of "who is responsible when things go wrong with AI" has received a lot of attention in ethics and policy circles (46, 47). The Canadian Association of Radiologists has approached this discussion by focusing on degree of autonomy as a critical determinant. Seeing as most current applications of AI strictly define its role as assistive, including in "intended use" statements that carry regulatory weight, it is reasonable to suggest that ultimate liability for erroneous decisions such as misdiagnosis would rest with the responsible clinician. However, as the degree of autonomy increases, the degree of liability should shift toward the manufacturer, provided that the clinician can prove that they were using the AI tool exactly as intended. Another potential player is the healthcare system or institution that implemented the AI algorithm, especially if it is determined that, as with any other tool or technology, the organization has a duty to deploy it appropriately (21). However, there are concerns that difficulties in explaining algorithmic decisions (see section **Transparency and Explainability**) may translate into difficulties for patients who suffer harm in proving causation by an algorithm, regardless of the latter's autonomy (39). Thus, a res ipsa loquitur ("the facts speak for themselves") approach may come to be preferred, where it is the manufacturer that has a prima facie case to rebut, and which has successfully been used in cases of harm caused by machinery (48).

### Ensuring the Public's Needs Are Met

Patients and members of the public have a more nuanced understanding of tasks and roles in healthcare than they are frequently given credit for. In research we commissioned, for example, we found that 45% of respondents (in a sample selected to be representative of the UK adult population) agreed that AI should be used to "help diagnose disease," but only 17% agreed that it should be used to "take on other tasks performed by doctors and nurses," such as breaking bad news; 63% said it should not be used for this purpose (7). Similarly, attitudes to data sharing for AI research are complex and nuanced. For example, in a workshop study conducted by the Wellcome Trust with 246 patients and HCPs, 17% of participants indicated opposition to giving commercial companies access to their data for the purposes of research. However, when data sharing was tied to the possibility of benefits from this research, 61% of the same study participants indicated they would rather share their data with commercial companies than miss out on potential positive outcomes (49). Many such studies of attitudes to data sharing exist [and the Understanding Patient Data initiative provides an excellent compendium of these studies and their major findings (50)], but two themes emerge across all of these studies as critical factors in determining readiness to share data: firstly, the importance of trust in the institution carrying out the research or development, and secondly, the importance of communicating potential benefits clearly.

### Regulation

In our 2018 report, we discussed the looming potential of a clash between existing healthcare regulators, and the new regulators, oversight bodies, and advisory committees being set up by governments and multinational organizations to focus on AI more generally, such as the Centre for Data Ethics and Innovation in the UK, and the EU's High Level Expert Group. As it turns out, no such clash has transpired, as newer AIfocused bodies have thus far been content to leave the realm of health and care to the more established regulators. However, this does not mean that regulatory certainty has followed. The healthcare regulatory space is crowded, and the speed of technological development means that these regulators have been undertaking an exercise of rapid capacity building, to be able to consider the potential impacts of these technologies. Furthermore, communication between these regulators needs to occur to ensure clear responsibility for all parts of the development process, and to avoid regulatory gaps. In the UK, the think tank Reform has released a series of resources that definitively map each step in developing a data-driven healthcare tool (from idea generation, through to securing data access, through to undertaking clinical research, to ascertaining regulatory compliance and post-market surveillance) to specific regulators, and lays out the requirements at each stage (51). The CEO of NHSX, UK Government unit with responsibility for setting national policy and developing best practice for NHS technology, digital and data, has acknowledged the need for better regulatory alignment (52). The very fact that such discussions are being had indicates the shift in thinking that is occurring in the health technology (healthtech) space, where rather than "software" and "apps," more enlightened technologists are realizing that what they are creating are medical devices, with the risks and benefits inherent in any medical intervention. Having first been expressed by the Software as a Medical Device (SaMD) initiative kicked off by International Medical Device Regulators Forum (IMDRF), this culture shift has arguably reached its zenith in the EU's Medical Device Regulation 2017/745, the post-market surveillance requirements of which will be fully applicable by May 2020. The launch of a European database on medical devices (EUDAMED) in May 2022, with a much wider scope than the existing one, will mean that data on post-market surveillance of various devices, including AI tools, will be publicly available to an unprecedented degree.

Another area where regulators may contribute is in the development of standardized benchmarks to allow replicable assessment of the performance of AI tools, both over time and in comparison to one another. This is precisely the aim of the AI for Health (AI4H) Focus Group, a joint initiative of the World Health Organization and the International Telecommunications Union (53).

### Consequences of Novel Insights

We have already alluded to the fact that the novel methods of data analysis these AI tools could provide can lead to unexpected insights being obtained from datasets (see section Data). Taken one step further, we can envisage a situation where these tools could potentially present patients and members of the public with information that (a) would not have been previously available, and (b) has the capacity to radically alter how they think about themselves and their health. A close analogy is genomic testing, with the new insights and attendant deep ethical questions it has forced us to consider (54). Just as with genes, if algorithmic predictions come to be equated with "destiny," then this could lead to a perception of futility and diminishment of hope. Negative consequences could include an individual fearing that they may not have access to certain interventions, and therefore not seeking them. Moreover, not everyone would like to discover that they are at high risk of one condition or another, especially if the treatment or cure options are limited. Decisions on these questions are likely to be nuanced and vary greatly in different situations and between different patients, but they should always be taken in the context of meaningful conversations between patients and their healthcare providers, and with deep appreciation of a patient's autonomy.

On a population level, algorithmic predictions of this nature can easily translate into algorithm profiling, create new categories and subgroups within existing populations. People may be assigned to these groups, and inferences and choices made about them, possibly without their knowledge (55). It is unclear where the balance should be struck between capitalizing on the new insights these algorithms could provide, and the threats to autonomy and individuality that categorization of societies and communities could lead to. An interesting suggestion has been to invoke the concept of solidarity as a means to reinforce the community-based nature of healthcare, and underlining the importance of the pursuit of a collective "good" (56).

### Trusting Algorithms

As referred to earlier (see section Ensuring the Public's Needs Are Met), trust in data-driven technologies and in their development may be intimately related to trust in the institutions responsible for this development. Further evidence for this is provided by a survey of 2000 people across Europe carried out by the Open Data Institute, where 94% of respondents said that whether or not they trust the organization asking for their data is important in considering whether or not to share data (57). It follows, therefore, to ask what it is that makes organizations trusted, and there is evidence to suggest that a major factor in determining this trust is the degree of perceived openness. Being open reduces the sense that a system or process has been captured by a particular organization or body that may not have the system's users' best interests at heart (58, 59). Moreover, openness allows the organization to demonstrate its competence in data handling, and to share its motivations for doing so; both these factors have also been found to be important determinants of readiness to engage by a systematic review (60). In order to address the requirements for openness and transparency in clinical trials involving AI algorithms, an international project is underway that aims to develop AI extensions to the existing CONSORT and SPIRIT checklists and guidance documents (61). On the other hand, given that a lot of development of AI for healthcare occurs in the private sector (see next section), legitimate concerns remain on the part of developers that regulators mandating excessive openness pose a threat to their intellectual property, and thus reduce the incentives for investment in developing these data-driven tools.

### Collaborations Between Public and Private Sector Organizations

The development of AI tools for widespread clinical use is dominated by partnerships between health and research institutions such as hospitals and universities, and private sector organizations. In the UK, there is a perception that such partnerships are needed as the healthcare system, the National Health Service, controls access to data, whereas capital for investment in R&D and the human talent required to create these tools is increasingly being concentrated in technology companies (62). There is the additional complicating factor of ensuring value not only for the patients whose data is used to develop these tools, but also for the taxpayer who funds the health service that acts as the data custodian, but who may never be in a position to directly benefit from the algorithms that are derived from such partnerships. There have already been some policy responses to such challenges. For example, following its launch in July 2019, one of NHSX's first acts was to confirm and take responsibility for enforcing a ban on exclusive data-sharing agreements between hospitals and commercial companies (63). This move has been seen as addressing concerns that exclusivity deals signed in the past by NHS hospital did not represent good value for money, and as signaling a shift toward more national decision-making on data use for technological applications.

### DEVELOPING PRINCIPLES, AND TRANSLATING THEM INTO POLICY

In some countries, the response to questions such as those posed by our 2018 report has been to develop frameworks outlining principles for the ethical use of data and AI in healthcare. At the time of writing, for example, the Royal Australian and New Zealand College of Radiologists has an open consultation on its Draft Standards of Practice for Artificial Intelligence; this will close on 29th November 2019 (64). Perhaps one of the more mature frameworks is the UK Department of Health and Social Care (DHSC)'s "Code of Conduct for data-driven health and care technology," which was developed using a Delphi methodology and was first published in September 2018. It is already in its third iteration following a process of expert review and public consultation (65). This principles-based document has been broadly well-received, and constitutes a world-first that is likely to serve as a global standard.

Nevertheless, it has been clear for some time that principles are a necessary but not sufficient condition to ensure safe and ethical development of healthtech tools. Specifically, it was realized that developers, predominantly coming from a technological background and therefore not imbued in the cultural norms and expectations specific to healthcare, needed support with demonstrating adherence to the principles laid out in documents such as the Code of Conduct. Put another way, if the Code of Conduct laid out what developers should aspire to, what they wanted was guidance on how to do it. It is on this background that in October 2019, NHSX launched a series of resources specifically aimed at addressing this question (66). This combination of principles and policy has been termed a "principled proportionate governance" model. Future Advocacy contributed to the development of this policy document by focusing specifically on Principle 7 of the Code of Conduct, which is concerned with transparency, openness, and ensuring safe integration of algorithms in existing healthcare systems. In order to address these issues, we signpost a number of existing resources that developers can use to demonstrate adherence to this principle, and classify them according to whether they are general processes that apply across all aspects of principle 7, or recommendations for specific processes that apply to certain subsections. For example, in order to conduct a meaningful, useful, and relevant stakeholder analysis, we encourage the use of value and consequence matrices in the context of the SUM principles developed by the Alan Turing Institute (67). Likewise, in order to encourage transparency around the means of collecting, storing, using and sharing data, we recommend the use of the Open Data Institute's "Data Ethics Canvas," a freelyavailable resource from a highly-respected institution (68). What is apparent is that rather than attempting to reinvent the wheel, HCPs and technologists collaborating on the creation of datadriven tools for healthcare need to develop greater familiarity with the work that is already ongoing in the wider technology ethics and policy community, as this cross-disciplinary approach is likely to suggest solutions to problems the field of healthcare is only beginning to grapple with.

### SAFE AND ACCEPTABLE: THE FUTURE OF AI IN HEALTH AND CARE

Two specific themes that run through the Code of Conduct, and that have been referred to at multiple points in this review are those of stakeholder engagement, and openness. Both these concepts are important when thinking about how developers can increase the likelihood of their tools being acceptable to patients and HCPs. For example, research with patients and members of the public has indicated that they do not want the development of these tools to come at the expense of the relationships that characterize good care (5, 7). Undertaking a robust and inclusive process of stakeholder analysis will help highlight relationships of importance in healthcare, and will ensure that the participants in these relationships involved in the development process. Furthermore, if development is guided by a deep understanding of the needs of the prospective user from an early stage, the product that comes out of the development process is more likely to be adopted and deployed. Similarly, as has been discussed previously, openness is a determinant of trust, which is itself a determinant of likelihood to engage with the development of these tools. It is therefore our recommendation that the principles of stakeholder engagement and openness run through the development of AI tools for all applications in health and care.

Although the principles and policies discussed above inspired by a drive to increase the safety of AI technologies as applied to health, they are not in themselves sufficient to guarantee safety. A detailed treatment of the various processes and standards related to the safety of these products is beyond the scope of this review, but it is noted that the shift in thinking toward treating these tools as medical devices, as described earlier and as encapsulated in the Medical Device Regulation, should go some way toward protecting users and patients, by placing more stringent requirements in terms of external audit, of developing and maintaining robust quality management systems, and of being more responsive to user feedback and field surveillance (69). Other safety issues that remain relatively unaddressed by current regulation and HCP training programmes include those of automation complacency, and of the use of dynamic, continuously learning systems (70).

### CONCLUSION

In this review we have updated the ten challenges we originally identified in 2018 with current thinking and practice, reflecting the rapid changes in the field of AI as applied to health and care. There is still some way to go in addressing these questions. It is clear to us that, given the iterative nature of technological development, the development of pathways for continuous review of principles and policy frameworks should be prioritized by governments and healthcare authorities. Furthermore, given the complexity of these technologies, a truly multidisciplinary approach is required. It is only by involving all stakeholders with a sincere desire to ensure the successful development and deployment of these tools that their risks will be minimized, and their opportunities maximized.

### REFERENCES


### AUTHOR CONTRIBUTIONS

MF conducted the original research by Future Advocacy cited in this review, and wrote this paper. OB also conducted the original research cited, reviewed this paper, and provided senior approval for its publication.

### FUNDING

The original work conducted by Future Advocacy that is cited in this paper was funded by the Wellcome Trust and by NHSX.

### ACKNOWLEDGMENTS

We are extremely grateful to our colleague Nika Strukelj for her contribution to the original research cited in this paper. We are also grateful to the numerous people, including patients and clinicians, who were interviewed or who took part in roundtables and focus groups as part of the research that informed this review.


**Conflict of Interest:** As of 1st September 2019, MF has been employed by Ada Health GmbH, which develops AI tools for use in health applications.

The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Fenech and Buston. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Mini Review: Deep Learning for Atrial Segmentation From Late Gadolinium-Enhanced MRIs

Kevin Jamart <sup>1</sup> , Zhaohan Xiong<sup>1</sup> , Gonzalo D. Maso Talou<sup>1</sup> , Martin K. Stiles <sup>2</sup> and Jichao Zhao<sup>1</sup> \*

*<sup>1</sup> Auckland Bioengineering Institute, The University of Auckland, Auckland, New Zealand, <sup>2</sup> Waikato Clinical School, Faculty of Medical and Health Sciences, The University of Auckland, Auckland, New Zealand*

Segmentation and 3D reconstruction of the human atria is of crucial importance for precise diagnosis and treatment of atrial fibrillation, the most common cardiac arrhythmia. However, the current manual segmentation of the atria from medical images is a time-consuming, labor-intensive, and error-prone process. The recent emergence of artificial intelligence, particularly deep learning, provides an alternative solution to the traditional methods that fail to accurately segment atrial structures from clinical images. This has been illustrated during the recent 2018 Atrial Segmentation Challenge for which most of the challengers developed deep learning approaches for atrial segmentation, reaching high accuracy (>90% Dice score). However, as significant discrepancies exist between the approaches developed, many important questions remain unanswered, such as which deep learning architectures and methods to ensure reliability while achieving the best performance. In this paper, we conduct an in-depth review of the current state-of-the-art of deep learning approaches for atrial segmentation from late gadolinium-enhanced MRIs, and provide critical insights for overcoming the main hindrances faced in this task.

Keywords: atrial fibrillation, left atrium, machine learning, image segmentation, convolutional neural network, LGE-MRI

### INTRODUCTION

The ability to perform body imaging has been described as one of the most important revolutions in medicine of the past 1,000 years for its contribution to medical prevention, diagnosis, and prognosis (1). Since then, medical imaging has never ceased to improve, allowing cardiologists, and researchers to assess heart size using chest x-rays (2), to evaluate heart mechanical work with echocardiography imaging (3–5) and to accurately determine the heart's dimensions using cardiac magnetic resonance imaging (MRI) (6). Due to its good image quality, excellent soft-tissue contrast, and absence of ionizing radiation, MRI has become the gold standard modality to precisely identify patients' cardiac structures and etiology, guiding diagnosis and therapy decisions (7).

Improvements of MRI techniques, particularly with the aid of contrast agents such as gadolinium, led to the development of late gadolinium-enhanced MRI (LGE-MRI), allowing for the detection of scar tissue located within the myocardium. This technique has been extensively employed for clinical studies at Utah University (8–10) to analyze and understand the role of fibrosis and underlying structures that sustain atrial fibrillation (AF), the most common cardiac arrhythmia predicted to become a new epidemic in the coming decades (11, 12). They notably

#### Edited by:

*Steffen Erhard Petersen, Queen Mary University of London, United Kingdom*

#### Reviewed by:

*Wenjia Bai, Imperial College London, United Kingdom Joao Bicho Augusto, Barts Heart Centre, United Kingdom Redha Boubertakh, Singapore Bioimaging Consortium (A*∗*STAR), Singapore*

#### \*Correspondence:

*Jichao Zhao j.zhao@auckland.ac.nz*

#### Specialty section:

*This article was submitted to Cardiovascular Imaging, a section of the journal Frontiers in Cardiovascular Medicine*

> Received: *08 January 2020* Accepted: *21 April 2020* Published: *27 May 2020*

#### Citation:

*Jamart K, Xiong Z, Maso Talou GD, Stiles MK and Zhao J (2020) Mini Review: Deep Learning for Atrial Segmentation From Late Gadolinium-Enhanced MRIs. Front. Cardiovasc. Med. 7:86. doi: 10.3389/fcvm.2020.00086* demonstrated the correlation between an increased amount of fibrosis present in the left atrial (LA) wall and a poor outcome of AF ablation (10). Over time, LGE-MRIs have become a widely accepted technique of choice allowing the detection and quantification of scar tissues located in the atrial wall.

The currently widely used clinical practice, including those conducted at Utah University, to analyze atrial structures and determine and quantify fibrosis distribution is by performing manual segmentation of the LA chamber from LGE-MRIs. However, the LA cavity represents a small volume (73 ± 14.9 cm<sup>3</sup> ), constrained by a thin atrial wall (2–3 mm) and comprised of complex anatomy (13–15). Moreover, the anatomical structures surrounding the atria display similar intensities that can mislead some segmentation algorithms (16) (**Figure 1**). As a consequence, manual segmentation of the atrium is a time-consuming, labor-intensive, and error-prone process (8, 17, 18).

Before the advent of deep learning, researchers tried to develop and improve automated approaches to alleviate the burden of manual segmentation (19, 20). Earlier algorithms proposed would require important manual tunings such as thresholding methods or region growing approaches (21, 22). Other methods were later developed to provide a higher degree of automation using classifiers or clustering approaches such as k-nearest-neighbor (23) or k-means clustering (24), respectively. More recent methods, using statistical classifiers like support vector machine (25), active shape model (26), or multi-atlases (27) approaches, gained increasing interest for medical image analysis and cardiac segmentation. Though many of these approaches showed promising results, none presented enough consistency to be implemented widely in clinical practice.

In recent years, the development of more powerful computational hardware and the growth of clinical databases enabled deep learning, a subset of artificial intelligence (AI) (28–32) capable of automatic feature extraction and learning, to achieve tremendous advances notably in image classification and segmentation (33, 34). When applied to clinical images, deep learning even surpassed human-level accuracies for the detection of cancer on cervical images (35). Certain architectures employed for deep learning have also been proven to be very effective when applied to cardiac imaging. For example, Avendi et al. (36, 37) used a three-stage approach combining convolutional neural network (CNN), stacked encoder, and deformable models to segment the left ventricle (and later the right ventricle) on a small MRI dataset of 45 patients. On the other hand, Bai et al. (38) used a large MRI dataset provided by the UK Biobank database to develop their CNN for ventricular chamber assessment (volume, mass, ejection fraction) and segmentation, obtaining accuracy scores competing with human-level precision.

This increasing interest around deep learning can also be seen in the number of participants using deep learning approaches for the various challenges designed to promote the development of more robust methods for cardiac image segmentation (39–41). Atrial segmentation is becoming a matter of greater importance and can highly benefit from the development of deep learning. As an example, during the 2018 Atrial Segmentation Challenge, 15 of the 17 published approaches used deep learning to segment the LA cavity from LGE-MRI images, yielding high accuracy results and outperforming conventional segmentation approaches (42). The number is in sharp contrast with the previous atrial segmentation challenge held in 2013, during which only one approach used a learning algorithm (16). Thus, this growing interest for deep learning in research challenges illustrates the shift occurring in atrial segmentation and more broadly in clinical imaging development, moving more and more toward deep learning-based approaches that will revolutionize clinical practice in the coming years.

In this paper, we aim to provide an analysis of the current deep learning technique used for atrial segmentation on LGE-MRIs. Firstly, we will describe some of the fundamental concepts employed in deep learning for medical image segmentation. Subsequently, we will detail the various deep learning approaches addressing the main obstacles faced performing automated atrial segmentation. Finally, we will conclude our review with an outline of future developments for atrial segmentation using deep learning and more broadly the future of AI in clinical practice.

## CORE CONCEPTS OF DEEP LEARNING

Since Alan Turing published his article "Computing Machinery and Intelligence" asking "Can machines think?" researchers have thrived to comprehend, develop, and achieve AI (43, 44) although today, after over 60 years, general AI is still not within reach. Nevertheless, in recent years, the growth of computer processing power and technologies has allowed researchers to develop algorithms capable of learning proficiently through deep learning using artificial neural networks (ANNs). As ANNs represent the most popular structure to perform deep learning, this section will describe the core concepts of ANNs and their various practical use in medical imaging.

### Artificial Neural Networks

Inspired by the biological neural networks found in the human brain (45), an ANN represents a collection of connected and tunable computational units, called artificial neurons, organized in a layered structure comprising a network (**Figure 2A**). Each neuron is a processing unit that can take multiple inputs. Each input is multiplied by an adjustable parameter called weight. All weighted inputs are summed together and passed through a nonlinear function to yield a single output (30). Neural networks can address complex, highly non-linear problems due to the layered and connected structure of ANNs. In particular, the introduction of more advanced feature learning tools such as convolutional layers, the improvement of large datasets and better activation functions, e.g., ReLU, greatly helped the development of deep learning for segmentation tasks.

The key attribute of an ANN lies in its ability to learn the unique traits of a dataset by adjusting its weights accordingly during a training process. Typically, the weights are randomly initialized at the start of training. The training process can then be described in three consecutive phases: (1) forward propagation, (2) error calculation, and (3) back-propagation. In the forward propagation stage, the input data (e.g., LGE-MRI image) is fed to the network and flows through the different layers that extract

FIGURE 2 | Schematic representation of the layered structure of an Artificial Neural Network (ANN), each circle representing an artificial neuron (details in the insert). (A) Each neuron receives inputs (*X*1, *X*2, *X*3), which are weighted (*w*1*X*1, *w*2*X*2, *w*3*X*3) and passed through an activation function *f*. (B) Architecture and details of one of the most popular convolutional neural network: U-Net.

the characteristic traits of the data, to ultimately yield a prediction (e.g., desired segmented image). The prediction is then compared to a reference data (e.g., manually segmented image by experts), called labeled data, and error is calculated using a dedicated function (called loss function). Finally, the weights are modified to minimize the estimated error, improving prediction accuracy. These three phases are repeated several times until the error converges to a significant minimized value.

### Different Tasks, Different Networks

Medical imaging encompasses a wide field of applications, and different tasks can represent different aspects of a diagnosis. Examples include the detection of an abnormal ECG signal, its classification as AF (46), or even atrial segmentation for planning for AF ablation (47). Therefore, each task requires a specific ANN architecture to properly model the desired operator, as the inputs and output can be drastically different depending on the nature of the task to be performed.

The number, types, and connections of layers in an ANN defines the network architecture. The CNN model is one of the most widely employed architectures in image analysis. CNN is a specific ANN architecture in which its hidden layers comprise one or more convolutional layers. The convolutional layers act as feature extractors from the input image, applying different convolution kernels to the initial image to generate feature maps containing meaningful information. Moreover, in convolutional layers, each artificial neuron receives their inputs from multiple neighboring neurons from the previous layer, sharing their weights and keeping the most spatially relevant information. This feature also allows a reduction in the number of parameters to adjust and therefore lowers the computational processing cost. Generally inserted in between sets of successive convolutional layers are pooling layers that are used to reduce the dimensionality of each generated feature map while retaining the relevant information. This down-sampling of the feature maps, typically by a factor of two, allows reduction of the computational cost while enlarging the field of view for the later convolutional layers.

For CNNs dedicated to image classification or detection, the architectures usually incorporate a fully connected layer as an end layer to summarize all information contained in the feature maps into a unique final prediction (output). Furthermore, CNNs can also be adapted for segmentation tasks by discarding the final fully connected layer and incorporating up-convolution layers in the network (35). These networks are called fully convolutional networks (FCNs). Up-convolution layers allow upsampling of the feature maps to produce, in fine, output with the same size as the original input size (48). Thus, FCNs using up-convolution layers can perform pixel-wise prediction and therefore image segmentation.

First proposed by Long et al. (33) for semantic segmentation, the FCN architecture has been adapted and further extended for medical imaging notably with U-Net, a U-shape architecture (**Figure 2B**) developed for segmentation of histological images (48). By using skip-connections between down-sampled feature maps and up-sampled feature maps, the U-Net architecture allows features forwarding between the encoding part and the decoding part of the network, preventing singularities and achieving higher accuracy (49–51). After winning the ISBI cell tracking challenge in 2015, U-Net became the principal FCN architecture for medical imaging segmentation. Other studies further developed the U-shape architecture to use 3D images as input to render the spatial resolution of anatomical structures more accurately (52, 53).

### ATRIAL SEGMENTATION USING DEEP LEARNING

In this section, we provide a summary of the main difficulties encountered in atrial segmentation and the state-of-the-art deep learning approaches developed from LGE-MRIs to address them. To this regard, many of the methods reviewed were proposed for the MICCAI 2018 Atrial Segmentation Challenge which represented a cornerstone for the development of deep learning approaches for atrial segmentation from LGE-MRIs. Firstly, we will analyze the main methods employed to address class imbalance issues, a recurrent problem in segmentation of small structures such as the LA. Secondly, we will review the approaches developed to exploit image context providing more information for semantic segmentation of the LA using multiscale strategies. Next, we will analyze the impact of loss function selection regarding either volumetric segmentation or surface segmentation. Finally, we will discuss the influence of the input dimensionality (2D/3D) for atrial segmentation when dataset size represents a significant shortcoming.

### Multi-Stage CNN and Class Imbalance

One of the difficulties of atrial segmentation is that the atrial cavity represents only a small fraction of the image volume (∼0.7%) and therefore creates a severe class imbalance between the over-represented background and the under-represented atrial structures, impairing the learning process. To address this issue, Vesal et al. (54) proposed to crop the input images from the center of the image, using fixed coordinates, to substantially remove the predominant background surrounding the LA. As a result, the learning process was entirely focused on a smaller region of interest (ROI), allowing better representation of the LA features. Based on a similar principle, other researchers (55–57), pushed this idea a step further by using a multi-CNN approach for atrial segmentation (**Figure 3A**). In their approaches, two consecutive networks were employed instead. The first CNN was specially trained to localize the LA on each input, allowing to subsequently crop out the unwanted background around the LA, as a prior step to segmentation. Then, the second network was dedicated to the segmentation task itself, focusing entirely on a small patch of each image.

Despite following a similar idea, it is important to distinguish these two methods. As the LA can show different positions on LGE-MRIs, using fixed coordinates from the center of the image to crop may result in unwanted cropping of relevant LA pixels. On the other hand, by dynamically centering the ROI on the LA for each input, multi-CNN approaches ensured the conservation of the atrial structures, cropping exclusively superfluous background pixels, and consequently optimizing background isotropy for the learning process.

To quantify the impact of each cropping approach, our recent study has investigated the importance of cropping the input patch

network (CNN 1) to extract the region of interest (ROI) and the second convolution neural network (CNN 2) to perform the segmentation of the left atrium. (B) Pyramid pooling architecture increases contextual information in the learning process. Pool, pooling layer; Conv, convolutional layer.

to the CNN either from the center of the image (image-centered) or from the center of the LA (center of mass/centroid of the atrium) using different patch sizes (ranging from 240 × 240 to 576 × 576) (58). When using center cropping of the image, we did not observe any significant influence of the patch size on the Dice score (92.03 vs. 91.95% Dice score for 240 × 240 and 512 × 512 image size, respectively). On the other hand, cropping the images from the centroid of the LA using dynamic cropping, we noticed a significant increase in the accuracy when using small patches (240 × 240) compared to large patches (576 × 576) (Dice score 92.86 vs. 92.26%, p < 0.01). The utilization of LA centroidcentered patches allows the CNN to process a more condensed region of the large LGE-MRI scan as the exact location of the LA is known, reducing the class imbalance of each patch processed by the network.

### Multi-Scale Approaches and Context Learning

Another problem that decreases segmentation performance and limits the extraction of relevant cues during the training phase is the inconsistency in the sizes of the LA anatomical structures such as the pulmonary veins or the left atrial appendage seen in LGE-MRIs from different patients.

He et al. (59) initially developed a pyramid pooling module, a multi-scale pooling, intended to prevent object misclassification by using image context information. By incorporating multi-scale

pooling, the CNN could associate contextual features, delivering more accurate classification. Based on this idea, Zhao et al. (60) proposed PSPNet, a neural network with pyramid pooling which incorporates object and image context to the learning process. These two approaches were developed using large miscellaneous datasets such as ImageNet (61), PASCAL VOC 2012 (62), or ADE20K dataset (63), and the pyramid pooling exploiting the context variability of the dataset, allowed to alleviate object missclassification or segmentation errors.

Inspired by He et al. (59) and the PSPNet developed by Zhao et al. (60), Bian et al. (64) proposed a multi-scale 2D CNN using spatial pyramid pooling to extract different scale features of the training dataset (**Figure 3B**). Thus, by means of different pooling kernel sizes and their combination, they proposed a CNN able to learn different cue size and improve network robustness against high shape variability usually encountered in clinical datasets.

However, the dataset employed for this approach (154 3D LGE-MRIs of the chest cavity) does not provide as much contextual variability as the large image database aforementioned, but rather displays the same object (the LA) in the same anatomical context (the thoracic cavity), providing only a few contextual variations to train on. Thus, arguably using pyramid pooling module for LA segmentation in the chest cavity might only show limited benefits from context learning.

Pyramid pooling also grants the ability to generate a fixedlength vector on a fully connected layer for classification tasks. This was illustrated by Chen et al. (65) using the pyramid pool module to extract more information from the dataset and classify the images between pre-/post-surgery, as they used a deeper U-Net to segment the LA simultaneously.

Based on the similar idea of incorporating multi-scale cues during the learning process, Vesal et al. (54) employed dilated convolution layers (also called atrous convolution layers) at the deepest level of their network. These convolution layers use dilatation rates to enlarge their receptive fields, allowing the network to learn different scale features (66). However, at each convolution the receptive field of each neuron is increased, therefore if not used wisely, receptive fields can become larger than the input image, resulting in a waste of memory while not improving the learning process.

These approaches ensure the incorporation of shallow features (spatial cues) and deep features (semantic cues) during the learning process. Therefore, combining effective class imbalance management with contextual cues could potentially improve even more the current methods. However, cropping to the smallest ROI possible using a first CNN of a two-stage approach, like Xia et al. drastically reduces the image context shown to the network. Therefore, the pyramid pooling module might not be able to provide contextual cues from the LA surrounding structures to improve the learning process. Moreover, during the cropping process, the input image size is significantly reduced. Thus, the use of dilated convolution for segmentation in the second network of this strategy becomes almost obsolete as the receptive fields would quickly grow larger than the input image during the learning process. Thus, fusing these strategies, although interesting, needs to be considered wisely.

### Loss Function

The current main evaluation metrics employed in segmentation task using deep learning is the Dice score, for which a higher accuracy reflects almost exclusively a volume of pixel accurately annotated rather than well-defined anatomy. Hence, most of the deep learning approaches for segmentation employ pixel-wise segmentation relying either on cross-entropy loss function or dice loss function. However, these loss functions weigh more volume over contours, which can impair the learning of accurate boundaries in favor of a correct volume.

To improve boundary accuracy, several teams have developed contour-oriented loss functions. For example, Jia et al. (67) proposed a contour loss function (based on the pixel Euclidean distance) that decreases when the contour gets nearer to the reference contours of the label images during training, providing spatial distance information to the learning process. In their approach, they associated the dice function loss to obtain pixelwise information, and their contour loss function for spatial information, achieving good shape consistency. In another strategy, Yang et al. (57) also defined a composite loss function, combining the overlap loss function (to reduce intersection between foreground and background) and a novel loss function called "focal positive loss" to guide the learning of voxel specific threshold and emphasize the foreground, improving, in fine, classification sensitivity. By recognizing ambiguous boundary location and enforcing positive prediction, this novel loss function improved the learning process and consequently the final atrial segmentation. However, these approaches did not obtain a better score then other approaches using more conventional loss function (e.g., dice loss, cross-entropy loss).

Therefore, it would be interesting to investigate the impact of a combined loss function allowing the network to learn from the volume (cross-entropy loss function or dice loss function) and from the contours of the LA. As segmentation tasks not only rely on minimizing volume error but also relies on boundaries accuracy (particularly for small structures). it is crucial to consider these two major aspects to ensure the reliability of the approach employed.

### Spatial Context (2D vs. 3D)

Even if clinical datasets are becoming bigger and better with the creation of centralized databases, for example, the UK Biobank (with more than 90000 3D MRI scans) (68), most of the current clinical databases available remain of humble size, making it difficult for a CNN to provide robust generalized solutions for segmentation. As an example, the current largest LGE-MRI dataset with only 154 3D LGE-MRIs (which represent nearly 9,000 2D images for training) appears relatively small when compared to the hundreds of thousands of images used for the major classification challenges for which the proposed approaches reach outstanding accuracy (59, 69, 70).

Thus, in this race of performance, it is important to consider how to make the best of the dataset employed. To this regard, the choice of the image dimensions employed (2D and 3D) approaches must be considered wisely. As 2D approaches need considerably fewer trainable parameters to yield good results, they are less gluttonous regarding memory consumption, and therefore require less time during the training process. Moreover, 2D approaches allow the processing of bigger batches of images compared to 3D approaches, as they require less memory to be processed. Therefore, 2D methods, using bigger batch size, help reduce gradient fluctuation and lead to faster convergence during the learning process. Additionally, 2D approaches can exploit more efficiently small datasets, reducing the risk of overfitting as the neural networks are fed with more images for the learning.

On the other hand, 3D approaches provide better spatial representation, fully exploiting data dimensionality as well as inter-slice continuity during training. This allows the network to learn major spatial features to render a more accurate 3D anatomy and yield, in fine, higher accuracy. Moreover, with the ever improvement of GPU technology, the current memory limitations will become of less importance in the near future; therefore, 3D approaches will become easier to use. Furthermore, as datasets are growing better and bigger, 3D approaches will be able to rely on more data and become more and more prominent in clinical imaging deep learning.

Nevertheless, relying on 2D images, Puybareau et al. (71) tried to improve the spatial representation of their dataset using a method called "pseudo-3D." Their method employed the generation of color images from the 2D grayscale images, each slice being color expanded into the R, G, B space using slice n-1, slice n and slice n+1, to generate a three-channel image. This approach allows an improved spatial representation and alleviates low contrast intensity between atrial tissues and background and enrich the dataset. However, even if this approach does not provide the expected spatial representation, it can be a method of choice if resources are limited.

Following the multi-view approach developed by Mortazi et al. (72), Chen et al. investigated the possibility to combine 2D images and 3D representation (73). In their study, Chen et al. extracted the 2D images for each anatomical view (axial, coronal, and sagittal) from 100 3D LGE-MRIs. Then, they combined a first encoder-decoder network using long short term memory convolutional layers to preserve inter-slice correlation using the axial view, and a second network to learn complementary information from the sagittal and coronal views. Finally, the outputs for each view of the network were fused to yield LA and PV segmentation simultaneously. Using their approach, they obtained 90.83% Dice score accuracy for PV and atrial segmentation. Employing the same method, Yang et al. studied the influence of dilated convolution to counter image resolution variability encountered using a multi-view approach (74). Using 100 3D LGE-MRIs, they achieved 89.7% Dice score accuracy underlining the necessity to investigate systematic parameters tuning to obtain optimal performances on a task-specific basis.

In the present context, it is important to consider the tradeoff using either a 2D approach requiring less memory and profiting more from the dataset (8,800 images rather than 154 3D LGE-MRIs) a 3D approach allowing more accurate spatial representation at the cost of longer and more difficult training. However, at the current stage, it is difficult to assess which method yields systematically better results. For example, during the 2018 Atrial Segmentation Challenge, the performances of 2D and 3D approaches remained very close (**Table 1**). Another possibility is to use a multi-view approach combining 2D images from different views to improve the spatial representation. These methods require training each view separately before combining the different output for the final prediction. While interesting, these methods still need improvement to reach the current state-of-the-art for atrial segmentation. Therefore, further improvements need to be sought regarding the size of the dataset, the number of approaches compared and the metrics employed to be able to draw a better conclusion.

### Evaluation Metrics

Another crucial point is to use metrics that provide a reliable evaluation of the final output using deep learning. One of the main scores employed is called Dice score and gauges the pixelwise similarity between the predicted segmentation and the reference data. Dice score provides a good representation of the specificity and the sensitivity of the model. However, Dice score metric has some limitations as it only evaluates a percentage of pixel accurately annotated neglecting contours and shapes of organs that can be a critical part of diagnosis in clinical practice. Other metrics providing distance measurements, such as mean surface distance and Hausdorff maximum distance, are usually employed to provide an alternative evaluation. Mean surface distance estimates the average error (in mm) between the outer surfaces of the reference data and the predicted segmentation. Given the size and structure of LA, mean surface distance is a meaningful tool to reliably assess the anatomical boundaries of the predicted segmentation compared to the reference data. Hausdorff maximum distance (in mm) represents the maximum error between the surface of the predicted segmentation and the surface of the reference data. Therefore, Hausdorff distance indicates solely the distance at the worst part of the segmentation, providing only partial information of the correctness of the predicted segmentation. By combining mean surface distance and Hausdorff distance, it is possible to evaluate the fidelity of the boundaries of the segmented structures reliably. Finally, a more clinical aspect of the predictions can be examined to express the reliability of the approach by calculating volume error or anteroposterior atrial diameter error when comparing the segmented prediction with the reference image.

### Atrial Wall and Scar Segmentation

While the deep learning methods for atrial cavity segmentation on LGE-MRIs are effective, the more clinically relevant tasks, such as LA wall and fibrosis (scar) segmentation, remain challenging. For LA wall segmentation, several approaches have been developed using traditional strategies such as multi-atlas segmentation or graph-cuts method (83, 84). However, currently no deep learning approaches have been proposed for direct LA wall segmentation from LGE-MRI. Yang et al. (85) proposed a hybrid approach combining multi-atlases and an unsupervised sparse auto-encoders for LA scar segmentation. A multi-atlas algorithm was used to segment the LA blood pool from the LGE-MRIs. Then, this initial LA cavity segmentation was dilated uniformly by 3 mm to include the LA wall. Next, they used a sparse auto-encoder to delineate and segment the fibrosis from the atrial wall. They achieved 90 ± 0.12% Dice score



*DC, Dice Score; LA, Left Atrium.*

for blood pool segmentation and 78 ± 0.08% Dice score for fibrosis segmentation. In their subsequent study (86), by finetuning the sparse auto-encoder parameters, the accuracy was improved to 82 ± 0.05% Dice score for fibrosis segmentation. While showing promising results, with these methods being only developed and tested on 20 3D LGE MRIs, they remain untested on larger datasets to assess their reliability against a broader range of anatomical variabilities regarding LA structures and fibrosis. Chen et al. (73) developed a CNN with an attention mechanism (87) to highlight salient features (in this case, the enhanced pixels of the scar tissues on LGE MRIs) and to force the model to focus on the scars locations. With this approach, Chen et al. obtained 77.64% Dice score for atrial scar segmentation using 100 3D LGE MRIs. This lower score (compared to that obtained from LA cavity segmentation) is potentially due to the scarcity of the LA scar pixels, which are small patches of inhomogeneous enhanced pixels within the atrial wall, impairing the extraction of meaningful features for fibrosis identification during the learning process of the CNN.

While these methods require atrial wall segmentation to be performed before fibrosis detection, Li et al. proposed a hybrid approach using a graph-cuts framework combined with a multiscale CNN approach for direct scar identification (88). In their approach, the LA and PV were initially delineated using a multi-atlas segmentation method. Then fibrosis was segmented and quantified using a graph-cut network in which two neural networks were dedicated to predicting edge weights. The first network was dedicated to predicting the probabilities of a node belonging to scar or normal tissue, while the second network was devoted to evaluate the connection between two nodes, yielding, in fine, the fibrosis segmentation. By embedding the CNN networks in the graph-cut framework, Li et al. obtained a mean Dice score of 70.2% for scar tissue segmentation, showing the possibility of effectively assessing LA fibrosis without the need for prior wall segmentation. Thus, even if the two networks employed did not directly perform the fibrosis segmentation task, the CNNs contributed to the optimisation process refining the graph-cut approach used in this study. However, these

methods tended to find fibrotic tissue out of the atrial wall boundaries regions, resulting in a drastic decrease in the final scores. Hence, the current models remain insufficient to provide anatomically accurate assessments allowing reliable fibrosis quantification due to the low Dice scores obtained. Thus, these approaches still require improvements to reach reliability and clinical applicability.

### DISCUSSION AND CONCLUSION

In this paper, we provided an in-depth analysis of the main automatic approaches using deep learning for atrial cavity segmentation from LGE-MRIs. Most of the proposed deep learning approaches for atrial segmentation used FCNs, most notably the very popular U-Net architecture. While U-Net is widely used for medical image segmentation in many disciplines (38, 89, 90), the discrepancy in the accuracy obtained between different studies still presents inherent issues involved in the generalized implementation of such architectures. By presenting a normalized survey of U-Net for the task of atrial segmentation, we showed the importance of proper class imbalance management, appropriate features extraction process, and meaningful loss function selection to yield precise and accurate atrial segmentation.

The current leading approach for LA segmentation from LGE-MRIs dataset involved a two-stage 3D CNN method which reached a remarkable Dice accuracy of 93.2%, currently the bestbenchmarked performance using 100 3D LGE MRIs (42). In this approach, the first network reduces class imbalance effectively while optimizing background isotropy using dynamic cropping, providing the second network with a targeted region for more localized segmentation. Additionally, they employed extensive data augmentation to enhance the generalization capability of their approach. Finally, they employed a 3D approach reinforcing the features' spatial representation, allowing them to obtain the current highest score for LA segmentation using machine learning.

Small training datasets represent one of the main limitations of clinical datasets as annotation and data gathering remains difficult. For example, the current largest LGE-MRIs dataset only contains 154 cases and therefore cannot effectively represent human anatomical variability. In fact, in order to improve performance, most of the developed approaches rely heavily on data augmentation such as affine transformations, cropping and scaling to virtually enlarge the dataset, also taking the risk of introducing more artifacts in the dataset. Moreover, the annotation process of anatomical structures is a complex and tedious process, which can be seen in the inter/intra-observer variability reported in several studies (38, 91). For example, atrial structures such as the mitral valve are difficult to segment due to the lack of clear anatomical border between LA and left ventricle. Moreover, the PVs are a very thin structure and represent a challenge for experts to distinguish from other structures on poorly contrasted images, and current protocols for defining the degree of extension of the PVs from the LA wall still remains subjective. Thus, this labeling uncertainty leads to some label variability in the dataset used, impairing the training process and potentially misleading the deep learning algorithm for the prediction process. However, despite all these difficulties the study shows the success of deep learning approaches reaching a high Dice score accuracy (>90% Dice score), showing the importance of careful parameter selection and architecture design for achieving the best performance (38).

In this study, we showed the potential of applying deep learning to perform automatic segmentation of the LA directly from clinical imaging data. The current accuracy of the various approaches presented is promising for future clinical implementation by providing highly accurate anatomical maps of the LA. Additionally, multiple teams already proposed auspicious solutions for fibrosis assessment using deep learning, providing particularly valuable information for AF ablation strategies that could highly benefit initial patient stratification, diagnosis, prognosis, and potential guidance for an optimized ablation strategy. Moreover, the ability to generate high fidelity segmentations such as the LA opens the way for further applications of deep learning to segment other anatomical structures. For instance, high accuracy left atrial appendage segmentation would provide crucial information for atrial thrombosis risk assessment (92). Thus, practitioners would be able to provide adapted treatment strategies on time, potentially reducing the number of stroke accidents caused by migrating atrial thrombus. Additionally, LA segmentation approaches could also be applied to the RA, providing a better understanding of the role of fibrotic extents spread through the RA myocardium notably in sinoatrial diseases (93).

Finally, it is important to underline the limitation of the current metrics employed. As most of the segmentation tasks rely on pixel-wise classification, Dice score proposes an efficient way to determine the correctness of the overlapping prediction. However, Dice score can be defined as a volumetric metric as it weighs more generously toward an accurate volume over precise anatomical delimitations. In clinical practice, Dice score and volume accuracy are important for assessing LA dilatation, but becomes irrelevant when assessing boundaries of fine structures such as LA. Therefore, other metrics such as mean surface distance representing the distance between the labeled surface and the predicted surface should be considered to produce better anatomical accuracy evaluation. The Hausdorff distance, representing the maximum distance between two surfaces, can also be used to evaluate the maximum error between prediction and label, potentially guiding algorithms to minimize their maximum error. Moreover, other limitations such as variations in image quality and resolution or the introduction of image artifacts intrinsic to scanner manufacturer have to be taken to account for future clinical deployment. At the current stage, no study has investigated the influence of LGE-MRI image quality on the Dice score but empirically, the best image quality tends to yield higher accuracy scores. However, in clinical practice image quality can vary tremendously as cardiac motion, body fat, and chest breathing motion, amongst others, can generate artifacts to various degrees on the final images. Therefore, to provide good generalization capacity, deep learning models have to be able to extract meaningful features regardless of the quality of the image. Similarly to the image quality issue, to obtain good generalization capacity, a network should be trained with many images from many different scanners. Thus, large multi-center datasets need to be built to ensure satisfying scanner variability and image quality variability representation for the learning process. Finally, it is crucial to promote deep models with efficient inherent generalization capabilities, as different image resolutions can represent a major difficulty for deep learning models using large scale datasets. However, promising results were demonstrated using pyramid pooling architecture ensuring extraction of multiscale features. Thus, at the current stage efforts remain to be made to develop a deep learning model satisfying these criteria for further clinical deployment.

With the development of computational hardware and the general effort to enrich medical image databases, the effectiveness of deep learning will only improve with time. Arguably, the current trend would lead to improve all fields of clinical practices as AI technologies become more widely developed and implemented. Furthermore, the current flourishing of the deep learning approaches in all areas of medical practice has already breached out research. Despite initial professional reluctance, AI technologies will become of major importance in the near future.

### REFERENCES


### AUTHOR CONTRIBUTIONS

KJ and JZ conceived and designed the work. KJ searched and read the literature and drafted the manuscript. ZX, GM, MS, and JZ provided guidelines, critical revision, and insightful comments to improve the manuscript. All authors read and approved the manuscript.

### FUNDING

This work was supported by the Health Research Council of New Zealand.

### ACKNOWLEDGMENTS

We would like to thank our colleagues: Dr. Nawshin Dastagir, Joseph Ashby, and Christopher Walker for their precious comments and insights that greatly helped to improve the manuscript. We would also like to thank Vincent Guichot who provided great assistance for the creation of the figures.

remodeling on MRI. Circul Arrhythmia Electrophysiol. (2014) 7:23–30. doi: 10.1161/CIRCEP.113.000689


using a minimum cost path approach. Med Phys. (2009) 36:5568–79. doi: 10.1118/1.3254077


quantification challenges: 9th international workshop. In: STACOM 2018, Held in Conjunction With MICCAI 2018, Granada, Spain, (2019).


Available online at: URL http://www.pascal-network.org/challenges/VOC/ voc2011/workshop/index.html.


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Jamart, Xiong, Maso Talou, Stiles and Zhao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Artificial Intelligence in Cardiac Imaging With Statistical Atlases of Cardiac Anatomy

Kathleen Gilbert <sup>1</sup> , Charlène Mauger 1,2, Alistair A. Young2,3 \* and Avan Suinesiaputra2,4,5

<sup>1</sup> Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand, <sup>2</sup> Department of Anatomy and Medical Imaging, University of Auckland, Auckland, New Zealand, <sup>3</sup> Department of Biomedical Engineering, King's College London, London, United Kingdom, <sup>4</sup> Centre for Computational Imaging and Simulation Technologies in Biomedicine, School of Computing, University of Leeds, Leeds, United Kingdom, <sup>5</sup> School of Medicine, Leeds Institute of Cardiovascular and Metabolic Medicine, University of Leeds, Leeds, United Kingdom

In many cardiovascular pathologies, the shape and motion of the heart provide important clues to understanding the mechanisms of the disease and how it progresses over time. With the advent of large-scale cardiac data, statistical modeling of cardiac anatomy has become a powerful tool to provide automated, precise quantification of the status of patient-specific heart geometry with respect to reference populations. Powered by supervised or unsupervised machine learning algorithms, statistical cardiac shape analysis can be used to automatically identify and quantify the severity of heart diseases, to provide morphometric indices that are optimally associated with clinical factors, and to evaluate the likelihood of adverse outcomes. Recently, statistical cardiac atlases have been integrated with deep neural networks to enable anatomical consistency of cardiac segmentation, registration, and automated quality control. These combinations have already shown significant improvements in performance and avoid gross anatomical errors that could make the results unusable. This current trend is expected to grow in the near future. Here, we aim to provide a mini review highlighting recent advances in statistical atlasing of cardiac function in the context of artificial intelligence in cardiac imaging.

Keywords: cardiac anatomy, machine learning, left ventricle, MRI, statistical shape

## INTRODUCTION

The main function of the heart is to pump blood to the lungs and body. In order to maintain the equilibrium state of normal blood circulation, the heart continuously adapts its structure, shape, and function in response to physiological challenges and long-term environmental factors. From the onset of injury or disease, the heart starts a cascade of structural and morphological adaptations, known as cardiac remodeling. Common cardiac remodeling includes left ventricular dilatation, increasing ventricular mass, hypertrophy, aortic dilation, and systolic/diastolic functional alterations. When this condition is prolonged, cardiac function may deteriorate until symptoms become clinically evident and may eventually lead to heart failure (1). Here, we define cardiac remodeling to encompass a wide spectrum of physiological processes from adaptive remodeling in athlete's hearts (2) and normal aging process (3) to adverse remodeling in hypertensive heart disorder (4) and ischemia (5). It is therefore critical in the management of patients with heart

Edited by:

Matteo Cameli, University of Siena, Italy

#### Reviewed by:

Bennett Allan Landman, Vanderbilt University, United States Nicolas Duchateau, Université Claude Bernard Lyon 1, France

> \*Correspondence: Alistair A. Young alistair.young@kcl.ac.uk

#### Specialty section:

This article was submitted to Cardiovascular Imaging, a section of the journal Frontiers in Cardiovascular Medicine

> Received: 15 November 2019 Accepted: 14 May 2020 Published: 30 June 2020

#### Citation:

Gilbert K, Mauger C, Young AA and Suinesiaputra A (2020) Artificial Intelligence in Cardiac Imaging With Statistical Atlases of Cardiac Anatomy. Front. Cardiovasc. Med. 7:102. doi: 10.3389/fcvm.2020.00102 disease to identify and quantify the different types of cardiac remodeling and associations with environmental and clinical factors and to predict the likelihood of adverse outcomes in the future.

The associations between traditional risk factors of cardiovascular disease (including smoking, raised blood pressure, raised serum cholesterol, and diabetes mellitus) and developing cardiac disease were discovered from large epidemiological studies such as the Framingham Heart Study (6). To better understand the mechanism of subclinical disease, before symptoms are clinically evident, modern imaging examinations were later included, such as in the Multi-Ethnic Study of Atherosclerosis (MESA) (7) and the UK Biobank study (8). These large-scale studies have enabled a massive increase of imaging data available for the investigation of variations in cardiac geometry and function by using statistical shape analysis, as well as providing training data for machine learning algorithms.

Modern cardiac imaging modalities include echocardiography, computed tomography (CT), and magnetic resonance imaging (MRI). Each modality has its own advantages and disadvantages, but MRI has unique attributes over the other modalities that have enabled large-scale imaging studies in the general population, including the study of 6,000 preclinical subjects in the MESA and 100,000 asymptomatic subjects in the UK Biobank. MR images are acquired without ionizing radiation, and tomographic analysis can be performed without any geometrical assumption. In a single examination session, cardiac MRI can provide anatomical and functional images of the heart and great vessels in multiple views with high contrastto-noise ratio, as well as high spatiotemporal resolution blood flow, microstructural tissue characterization, myocardial strain, blood perfusion, and scar images.

In this mini review, we focus on the rapid developments of machine learning combined with cardiac atlases. Although examples were taken mainly from cardiac MRI studies, these methods are generally extensible to other modalities. We first show how statistical shape analysis has enabled better understanding of cardiac shape remodeling within and between pathological groups. We then discuss current developments in machine learning to utilize the robustness of cardiac anatomy derived from statistical atlases to improve image analysis, including motion atlases to highlight the utility of dynamic data analysis vs. static analysis. **Table 1** compares representative papers in each category. We conclude with a discussion of future perspectives of cardiac atlases in the context of artificial intelligence (AI) in cardiac imaging.

### STATISTICAL CARDIAC ATLASES

Statistical atlases consist of maps of cardiac shape and function, which can be used to quantify the variation in the population and quantify the differences between cohorts. They can also be used to quantify shape scores in individual patients relative to standard population groups. For example, the Cardiac Atlas Project<sup>1</sup> (24)

<sup>1</sup>http://www.cardiacatlas.org

provides repositories of thousands of cardiac MRI studies (25) and benchmark data for the development of automated analysis algorithms, including segmentation of images (26) and shape analysis (27).

Two common atlas construction pipelines are shown in **Figure 1**, where both approaches lead to a comparable statistical analysis (28). In the first approach (9), images are analyzed to obtain the locations of cardiac landmarks (valve positions and the margins of the interventricular septum) and ventricular contours. The points are then mapped into 3D, and slice shifts due to breath-hold mis-registration are corrected. A 3D shape model template is then customized to the location of the landmarks and contours by minimizing the point-to-surface distances between the landmarks/contours and the model surfaces. Homologous points are then sampled from the surfaces and used to construct a point distribution model. This surface template fitting approach has also been translated to echocardiographic images where temporal resolution is much higher, as demonstrated in (20).

The second approach uses 3D images to establish a mean image template before generating cardiac mesh data. In (11), a high-resolution 3D MR template image and myocardial mesh are used. Each short axis image stack is then corrected for breath-hold mis-registration and registered to the template image using non-rigid image registration methods. For each case, a registration map is stored to give a mapping from subject space to template space at each voxel. The template mesh is then propagated to each subject using the inverse registration map. A point distribution model can then be calculated from the resulting homologous points. A similar approach was demonstrated in (12) by using CT images, with the advantage of high resolution and no breath-hold mis-registration in CT data.

Both these approaches benefit from recent advances in machine learning methods. Firstly, deep learning segmentation networks for cardiac images have been developed to enable fast generation of contours and landmarks (17, 29); and secondly, deep learning has enabled fast computation of registration maps, which can be trained without extensive manual image annotation using image similarity as the loss function (23, 30, 31).

### ATLAS MEASURES OF CARDIAC REMODELING

Let s ∈ R <sup>3</sup><sup>P</sup> be a shape vector with P homologous points in 3D. To extract shape parameters from a cohort or pathology group, a linear generative model is commonly applied, that is,

$$s \approx \underline{s} + \Phi^T b \tag{1}$$

where s ∈ R 3P is the mean shape estimated from the cohort, 8 ∈ R M×3P is the linear decomposition matrix (defining modes of shape variation), and b ∈ R <sup>M</sup> is the shape parameter vector. If N is the number of patient shapes in the cohort and M < N, then Equation (1) is called a dimension reduction technique. Because each 3D point in this point distribution model encapsulates approximately the same anatomical location in the heart, the relative locations of neighboring positions are highly correlated, TABLE 1 | Summary of cardiac atlas construction and deep learning methods with cardiac shape priors.


LV, left ventricle; RV, right ventricle; LA, left atrium; RA, right atrium; CAD, coronary artery disease; MESA, multi-ethnic study of atherosclerosis; ACDC, automated cardiac diagnosis challenge.

<sup>a</sup>CAP, http://cardiacatlas.org.

b ICL,http://wp.doc.ic.ac.uk/wbai/data/.

<sup>c</sup>CISTIB, http://www.cistib.org/full-heart-pca-model-all-phases/en/full-heart-pca-model-all-phases.

<sup>d</sup>VitaLabAI, https://bitbucket.org/vitalab/vitalabai\_public/src/master/VITALabAI/model/.

<sup>e</sup>Github, https://github.com/j-duan/4Dsegment.

<sup>f</sup>Github, https://github.com/UK-Digital-Heart-Project/4Dsurvival.

<sup>g</sup>Github, https://github.com/cq615/Joint-Learning-of-Motion-Estimation-and-Segmentation-for-Cardiac-MR-Image-Sequences.

enabling the dimension reduction method to distill a small number of shape parameters.

The most common dimension reduction method is principal component analysis (PCA), whereby shape modes are ordered by the amount of variance explained. Most of the shape variations can then be explained in terms of the first few principal modes of variation. In the MESA baseline imaging study, the PCA mode explaining the most shape variation was associated with the size of the heart, even after correction for patient height (9). This is a common finding because the first mode often relates to the amplitude of the studied descriptors. The second mode was associated with sphericity. Clinically, these first two PCA modes are known to be associated with adverse outcomes in both symptomatic disease and asymptomatic cohorts (32–35).

PCA regression enables evaluation of the relationships between the PCA scores and clinical factors such as diabetes (9, 28). However, PCA is an unsupervised dimension reduction method, and component modes do not in general map to recognizable shape characteristics (9, 28). Supervised dimension reduction methods such as information maximizing component analysis have shown promise for quantifying the differences between a patient group and a control group, or two patient groups (36). Another approach is to combine dimensionality reduction with direct correlation with clinically defined remodeling indices such as ventricular volumes, wall thickness, and sphericity, by using the partial least squares method. Zhang et al. (37) applied this method in conjunction with a sequential orthogonalization algorithm to construct orthogonal shape scores, which are optimally matched with known clinical indices of remodeling. More general ways of characterizing the shape probability distribution have been investigated (38).

Gilbert et al. (28) found that both volume and surface cardiac atlases showed similar morphometric characteristics and similar relationships between risk factors and left ventricular

shape. Thus, shape scores derived from atlases are robust to differences in construction methodology and quantify real anatomical relationships with cardiovascular risk factors. Morphometric scores were found to be more sensitive to cardiovascular risk factors than traditional measures of mass and volume. Mauger et al. (10) used a biventricular shape model to study right and left ventricular interactions in the UK Biobank study. A subdivision surface biventricular shape model was automatically customized to manually draw contours using a diffeomorphic least squares optimization algorithm. A control group sub-cohort consisting of 630 participants with no cardiovascular risk factors and normal cardiac parameters was used as a reference group to quantify shape differences due to traditional risk factors. Morphometric scores were computed using linear regression to quantify shape variations associated with prediction variables including sex, age, height, high cholesterol, high blood pressure, obesity, and smoking as well as diabetes, previous myocardial infarction, and angina. This regression approach enabled quantification of the effects of each prediction variable while controlling for the effects of the others.

In congenital heart disease, atlas-based analysis of shape variations can provide quantitative measures of deterioration before detection of symptoms. Sheehan et al. (39) developed a method for patient customization using a linear combination of database templates. This knowledge-based reconstruction method has shown accurate and rapid analysis of right ventricular shapes and volumes in patients with tetralogy of Fallot (39), dextro-transposition of the great arteries (40), and other types of congenital heart disease (41). A more dilated and spherical right ventricle was found in patients with transposition of the great arteries after atrial switch, with regional reduction in function at the base (42, 43). These methods assume that the patient heart geometry is accurately represented by a linear combination of cases in the database. An alternative approach is to jointly estimate the shape and the underlying statistical shape model so that the statistical model can be automatically updated while analyzing new cases (44). Shape model templates have been constructed to describe common congenital pathologies, such as congenitally corrected transposition of the great arteries, enabling a wide range of pathologies to be accurately characterized (45). In singleventricle pathologies, with tricuspid atresia and Fontan repair, shape mode scores were able to quantify differences in shape and function, with more spherical ED shapes being associated with reduced longitudinal shortening (46). Atlas analysis in

association with biomechanical analysis may be able to identify mechanisms underlying changes in function with developing disease (47).

### DEEP LEARNING NETWORKS WITH CARDIAC SHAPE PRIORS

Deep learning is currently the state-of-the-art method for medical image feature extraction and supervised analysis. Its superior performance has surpassed any other traditional machine learning algorithms in many applications, including cardiac imaging (48, 49). This success is mainly attributed to the automatic generation of optimal features, rather than relying on handcrafted features. This means that without significantly modifying the architecture, deep learning allows transfer of techniques, thereby shifting the data domain from one application, for example, natural image analysis, to another, for example, cardiac imaging. In addition, transfer learning directly reuses a pretrained network and fine-tunes to a new application domain. Examples include transfer learning of retinal image segmentation into cardiac vessels (50) or predicting cardiovascular risk from retinal fundus images (51). This flexibility and reusability of deep neural network architectures have led to rapid development. However, there are some limitations. Deep learning is prone to overfitting and usually cannot infer the anatomical correctness of the prediction results. The network's parameters are also sensitive to the data or cohort used during training (implicit bias). Statistical atlases or shape priors can therefore be integrated with deep learning to overcome these limitations. Thus, anatomical correctness can be imposed by enabling the network to learn the biological constraints as well as the measurement correlations.

Machine learning methods can add new quantitative analysis techniques to examine the relationships between shape features and clinical status, in addition to the traditional methods of linear or logistic regression. These are now being applied to statistical shape atlases to characterize differences in patient groups and predict outcomes. In the STACOM 2015 shape analysis challenge (27), various machine learning algorithms were compared on a benchmark dataset, and 11 groups participated to determine cardiac shapes of patients with myocardial infarction from healthy subjects. Five groups used the z-scores (standardized b vector in Equation 1) in different ways to classify myocardial infarction shapes. The training accuracies ranged between 0.93 and 0.98, whereas the test accuracies were 0.83–0.98. Shape atlases have been useful in identifying genetic mutations affecting left ventricular (LV) mass (52). Shape features associated with disease can be interpreted through visualizations using deep generative networks (53).

Incorporating cardiac anatomy in deep learning was demonstrated by Oktay et al. (13) with an anatomically constrained neural network. Two separate autoencoder networks were appended after the final predicted segmentation mask and the ground truth mask layers, which extracted features from mask images separately. A global shape similarity loss function calculated from the output of autoencoder networks was introduced as a way to constrain the optimization to follow the same shapes as the ground truth. Their results showed improved super resolution and segmentation accuracies in the long-axis view<sup>2</sup> by correcting mis-registration between image slices. Another shape-based loss function was also proposed by Yang et al. (54) to segment the right ventricle.

Alternatively, shape priors can be introduced directly inside a network (14–16). Zotti et al. (14) inserted a cardiac shape probability map before the final layer of a U-Net architecture to ensure that the output segmentation masks were valid. Chen et al. (15) also modified a U-Net architecture with cardiac shape priors, but they modified the bottom layer (feature extraction layer) by inserting short-axis and long-axis feature vectors trained independently from short-axis and long-axis cardiac MRI, respectively. Duan et al. (16) embedded a more specialized shape refinement subnetwork into the main segmentation and super resolution network. The subnetwork consisted of shape affine alignment, atlas selection, and non-rigid free form deformation registration operations. The network was able to generate smooth high-resolution 3D cardiac mesh data from low-resolution cardiac MRI.

### DEEP LEARNING FOR STATISTICAL CARDIAC ATLASES

The ability of deep learning to learn non-linear relationships between different data domains and the high focus on segmentation have enabled several studies to directly link cardiac imaging and statistical shape analysis. In Equation (1), patientspecific shape parameters with population reference of 8 are represented by b ∈ R <sup>M</sup> vectors. A statistically plausible new shape of s can be generated by setting values of b within ±2 <sup>√</sup>σ, where σ is the eigenvalues from the PCA. Shape generation can also be performed by sampling from a probability distribution function learned from an atlas (38).

Attar et al. (17) proposed a neural network model that learns how to directly predict shape parameters b given a combination of cardiac MRI and patient characteristics metadata [age, weight, height, body mass index (BMI), body surface area (BSA), heart rate, systolic blood pressure (SBP), diastolic blood pressure (DBP), sex, smoking status, and alcohol consumption]. Hence, the network was trained to predict statistically plausible b vector from images and metadata parameters to generate a 3D cardiac shape by using (Equation 1). Also, Clough et al. (18) used a variational autoencoder to generate interpretable representations of patients with low ejection fraction. This aids the interpretability of machine learning algorithm, which is vital to their acceptance in the clinical community.

A different approach to embed statistical shape parameterization into deep neural network was proposed by Painchaud et al. (19). A separate adversarial variational autoencoder was trained to generate a latent space of cardiac

<sup>2</sup> In standard cardiac imaging acquisition, short-axis views show an image of the left and right ventricular chambers, and long-axis views typically show either two chambers (left ventricle and left atrium) or all four chambers in a single image. Short-axis views are generally perpendicular to the long-axis views.

anatomy from mask images and was then connected to another anatomical variational autoencoder to correct errors after segmentation. Hence, this network (19) indirectly learned patient-specific parameters in the latent space without actually modeling how the latent space should be parameterized as in (17). The disentanglement of latent spaces is an active area of research and shows promise in factorizing anatomical representations from modality characteristics (55).

### DYNAMIC ATLASES

Many of the features associated with cardiac pathology are manifest as changes in motion rather than changes in static shape. As the heart is responsible to deliver sufficient blood into the circulation system, the onset of cardiac diseases forces the heart to adapt its motion. Changes in cardiac shape deformation, myocardial strain, and strain rate are examples of important dynamic remodeling indices when building a cardiac motion or dynamic atlas. However, building a dynamic atlas is sometimes limited by the temporal resolution of the acquired imaging data, although combining two modalities, such as MRI and echocardiography (20), can increase the temporal resolution of the atlas considerably.

There are a significant number of cardiac applications that can get the benefit of machine learning from cardiac motion. In pulmonary hypertension, a motion atlas is combined with the latent space of autoencoder network to predict the survival rate (21). A machine learning system that combines motion atlas with non-motion data (ECG and clinical reports) has been demonstrated in the selection of patients with dyssynchrony for cardiac resynchronization therapy (22). The study of dynamic atlases will be a fruitful area of future research. Deep learning methods for combined shape and motion analysis are now being developed (23), which can be used to extend previous methods for motion atlasing (11).

### DISCUSSION

A statistical atlas of cardiac anatomy is a powerful tool to analyze a patient-specific remodeling compared with the reference population. An abnormal cardiac shape can be quantified against

### REFERENCES


a population reference, regional wall motion differences can be compared across pathological groups, and a hypothetical cardiac shape can further be predicted from a longitudinal study. Apart from that, a statistical atlas can be used as a reference by machine learning algorithms to constrain their analysis within valid anatomic boundaries.

In summary, we have reviewed three ways to integrate a statistical atlas into a machine learning framework. The first approach is to directly use individual shape atlas parameters, for example, the z-scores, as the training data. This approach needs homologous points generated from a shape modeling technique derived from images and a registration method to align points to remove variations in the global position and orientation. The effectiveness of this approach was demonstrated in the STACOM 2015 challenge. The second approach is to use statistical atlases as shape priors either as a way to measure shape similarity in a loss function or to add shape features to be learned inside the network. The third approach is to predict statistical shape parameters or a location in a shape-based feature space directly from images. This is a promising field for deep learning, because it can generate relationships between two completely different data domains.

In the future, statistical atlases show promise for augmenting deep learning methods, and vice versa. An atlas can add robustness to the prediction results because additional information on a reference population is included during the learning process. Atlases will also increase the interpretability of the AI process, which is critical for the acceptance of AI in health care.

### AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

### FUNDING

This work was provided by the National Institutes of Health (USA) 1R01HL121754. AY acknowledges HRC Grant 17/234. CM acknowledges New Zealand Heart Foundation Grant 1695.


the multi-ethnic study of atherosclerosis. J Cardiovasc Magn Reson. (2014) 16:56. doi: 10.1186/s12968-014-0056-2


comparison with cardiac magnetic resonance in adults with congenital heart disease. Echo Res Pract. (2015) 2:109–16. doi: 10.1530/ERP-15-0029


Peters TM, Staib LH, Essert C, Zhou S, et al. editors. Medical Image Computing Computer Assisted Intervention – MICCAI 2019. Cham: Springer International Publishing (2019). p. 714–22. doi: 10.1007/978-3-030-32245-8\_79


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Gilbert, Mauger, Young and Suinesiaputra. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.