Fast Posterior Estimation of Cardiac Electrophysiological Model Parameters via Bayesian Active Learning

Probabilistic estimation of cardiac electrophysiological model parameters serves an important step toward model personalization and uncertain quantification. The expensive computation associated with these model simulations, however, makes direct Markov Chain Monte Carlo (MCMC) sampling of the posterior probability density function (pdf) of model parameters computationally intensive. Approximated posterior pdfs resulting from replacing the simulation model with a computationally efficient surrogate, on the other hand, have seen limited accuracy. In this study, we present a Bayesian active learning method to directly approximate the posterior pdf function of cardiac model parameters, in which we intelligently select training points to query the simulation model in order to learn the posterior pdf using a small number of samples. We integrate a generative model into Bayesian active learning to allow approximating posterior pdf of high-dimensional model parameters at the resolution of the cardiac mesh. We further introduce new acquisition functions to focus the selection of training points on better approximating the shape rather than the modes of the posterior pdf of interest. We evaluated the presented method in estimating tissue excitability in a 3D cardiac electrophysiological model in a range of synthetic and real-data experiments. We demonstrated its improved accuracy in approximating the posterior pdf compared to Bayesian active learning using regular acquisition functions, and substantially reduced computational cost in comparison to existing standard or accelerated MCMC sampling.


INTRODUCTION
With advanced technologies in medical imaging and image analysis, computational models can now closely replicate the physiology of a human heart (Taylor and Figueroa, 2009;Morris et al., 2016). As these models are virtual in nature, they have the potential to enable prediction, diagnosis, and treatment planning of certain conditions of a patient heart with little to no harm to the patient Arevalo et al., 2016;Zahid et al., 2016;Prakosa et al., 2018;Cronin et al., 2019). However, while the geometry of a specific patient heart can be depicted with increasing accuracy, patient-specific physiology remains a challenge. A main difficulty arises from the need to customize patient-specific material properties (Taylor and Figueroa, 2009;Neal and Kerckhoffs, 2010), which are typically spatially varying throughout the 3D organ and may change over time for the same individual. At the same time, they often cannot be directly measured in high resolution, but have to be estimated from relatively limited measurements. This results in a challenging inverse problem for estimating highdimensional (HD) unknown parameters of a complex, nonlinear, and computationally expensive forward model that relates the unknown parameters to measurements.
There are two general approaches to this inverse problem: deterministic optimization and probabilistic inference. In deterministic optimization, we seek a single optimal value of the unknown model parameter that will minimize the mismatch between the model output and the measurement data Wong et al., 2012Wong et al., , 2015Yang and Veneziani, 2015;Balaban et al., 2018;Mineroff et al., 2019;Barone et al., 2020a,b). These estimates, however, do not take into account the uncertainty in the measurement data, nor can they offer insights into the presence of non-unique solutions that can match the same data. These can be overcome by probabilistic inference of the posterior pdf of the model parameters given available measurements.
Existing approaches to the probabilistic estimation of model parameters are generally based on Markov Chain Monte Carlo (MCMC) sampling. The computation expense of the forward simulations of these models, however, makes MCMC infeasible due to the reliance on a large number of sampling, each requiring a simulation run. Approaches to accelerating such sampling can be loosely divided into two categories. On one hand, a variety of hybrid sampling methods have been developed, which accelerates random sampling using information about the target pdf such as its gradient (Roberts et al., 1996;Neal, 2010) and Hessian matrix (Martin et al., 2012). These information, however, are often difficult to extract from the posterior pdf involving a complex simulation model. On the other hand, it is possible to construct a computationally efficient approximation, i.e., surrogate model, of the expensive simulation process, such that the related pdfs become substantially faster to sample. These surrogate models may be physics-based reduced-order modeling Lassila et al. (2013), or data-driven approximations such as Gaussian process (GP) (Kennedy and O'Hagan, 2000;Rasmussen, 2003) and polynomial chaos (Spanos and Ghanem, 1989;Xiu and Karniadakis, 2003;Marzouk and Najm, 2009). Directly sampling the surrogate-based posterior pdf, however, may lead to limited accuracy due to the difficulty to build a globally accurate approximation of a complex nonlinear simulation model. In our previous work, we attempted to mitigate this issue by using this surrogate-based pdf to accelerate, rather than replacing, the sampling of the actual pdf (Dhamala et al., 2018a). Specifically, this was achieved by a two-stage MCMC strategy where the surrogate-based pdf works as a proposal distribution to increase the acceptance rate of sampling (Dhamala et al., 2018a). While this ensures the accuracy of posterior sampling, the reduction in the computation becomes limited due to the fundamental reliance on sampling the original pdf involving expensive simulation processes.
In this study, we develop a Bayesian active learning approach to provide an accurate surrogate model of the posterior pdf of simulation model parameters such that there is no need of further MCMC sampling of the original computational-intensive pdf. This is achieved with two key innovations. First, unlike most existing approaches that rely on learning a surrogate of the simulation model over the prior distribution of the parameter space (Dhamala et al., 2018a), we propose to directly learn a surrogate of the posterior pdf. We formulate this posterior pdf estimation as an active learning problem where we intelligently select a minimal number of training points focused on the posterior support of the parameter space. Second, we present new acquisition functions during the active learning to utilize the shape of the posterior pdf to improve the selection of training points. To enable this active posterior estimation over a highdimensional parameter space, we further combine it with our previously developed approach that uses generative modeling of the high-dimensional parameter space (Dhamala et al., 2018b) to embed active learning of a high-dimensional posterior pdf into a low-dimensional (LD) space.
While our method is generally applicable to posterior estimation of HD parameters in complex models, in this study it was applied to estimate tissue excitability as parameters of the cardiac electrophysiological model. Experiments were performed on three different groups of data: simulation data with a synthetic setting of abnormal tissues, simulation data generated from a high-fidelity biophysics model blinded to the model used in the posterior estimation, and real data obtained from patients with infarcts derived from in vivo voltage mapping data. In the synthetic group, we compared the results with direct MCMC sampling of the original posterior pdf, two-stage MCMC method (Dhamala et al., 2017a), and direct MCMC sampling of the surrogate pdf learned using regular Bayesian active learning. The results showed that the presented method was able to use 0.6% computation of the direct or two-stage MCMC methods to deliver an accurate estimation of the posterior pdf, with significantly improved accuracy compared to using regular Bayesian active learning. In the other two sets of experiments, we evaluated and interpreted the mean, mode, and uncertainty of the estimated tissue excitability using in vivo magnetic resonance (MR) scar imaging or voltage mapping data.
The key contributions of this study can be summarized as: 1. We present a Bayesian active learning approach for fast approximation of the posterior pdf of the parameters of expensive simulation models, with acquisition functions designed to improve the accuracy of the approximation in order to remove the need of subsequent MCMC of the original computationally expensive pdf. 2. We leverage our previously developed approach (Dhamala et al., 2018b) to embed the active learning over HD space into a LD manifold, enabling active posterior inference over HD model parameters representing spatially varying tissue excitability. 3. We thoroughly evaluated the performance of the presented method in comparison with existing works in probabilistic parameter estimation in cardiac electrophysiological models, both in synthetic data involving MCMC sampling as reference data, and in real data involving MRI scar imaging and in vivo voltage mapping as reference data.
The rest of the study is organized as follows. In section 2, we review related works in detail and in section 3, we present background of this study. In section 4, we present our methodological developments. We present experiments and results for both synthetic and real data from the cardiac electrophysiology system in section 5. Finally, we give some concluding remarks with limitations and future scope.

Probabilistic Parameter Estimation in Complex Models
For complex models where the posterior pdf of model parameters is analytically intractable, the area of estimating parameters largely depends on MCMC sampling. Metropolis-Hastings (MH) sampling, Gibbs sampling, and many more classical MCMC methods are developed in Metropolis andUlam (1949), Hastings (1970), Geman and Geman (1984), Gelfand and Smith (1990), and Gelfand et al. (1992) and applied in different areas to estimate parameter uncertainty (Andrieu et al., 2003). The reason for the extensive use of MCMC is that it can deal with HD parameters, non-linear relation between parameters and observations, and noisy data. However, these properties also make it very slow as, by design, the sampling takes a large number of simulations to converge. With rapid developments of parallel computing, parallel MCMC to accelerate the computation is proposed in Brockwell (2006) and Byrd (2010), Wang (2014) but these can improve neither the convergence rate nor reduce the number of simulations needed. In exploring uncertainty on HD parameters, reversible jump MCMC is used in Brooks (1998). Combination of differential evolutions to have subspace exploration is used in Laloy and Vrugt (2012), while non-differential sparse priors are developed in Cai et al. (2018). Gradient and Hessian information of the pdfs are used to accelerate sampling even with poor initial models in Zhao and Sen (2019), although these information are nontrivial to extract when the pdf contains complex simulation models. Alternatively, surrogate models have been widely employed to generate a computational-efficient approximation of the posterior pdf that can be faster to sample. Polynomial chaos (Spanos and Ghanem, 1989;Xiu and Karniadakis, 2003;Knio and Le Maitre, 2006) and GP (Kennedy and O'Hagan, 2000;Rasmussen, 2003) are pioneers in surrogate modeling. In Adams et al. (2008), Konukoglu et al. (2011), and Gramacy and Lee (2008), Schiavazzi et al. (2016), to build posterior pdf, GP surrogate is built of the pdf at first, and then, MCMC sampling is performed from that to avoid expensive simulations. It is, however, difficult to obtain an approximation of a complex simulation model over the prior parameter space. As a result, when direct sampling of the surrogate pdf is substantially more efficient than sampling the original pdf, the accuracy is often largely compromised (Dhamala et al., 2018a). Recently, hybrid approaches are emerging that use the surrogate pdf to accelerate rather than replace sampling. In Dhamala et al. (2018a), a twostage model is introduced where a GP surrogate of exact posterior pdf is built in the first stage and is used to improve the acceptance rate of candidate samples in MCMC sampling in the second stage. In Dunbar et al. (2020), a three-stage model is presented for uncertain quantification of a complex climate model parameters where model calibration using Kalman inversion is performed in the first stage, building GP surrogate to emulate parameter-todata map is performed in the second stage, and MCMC sampling of the posterior pdf of the climate model parameters is performed in the final stage. While these hybrid approaches improve the accuracy of sampling, the reliance on sampling the original pdf limits the extent to which the computation can be reduced.

Parameter Estimation Using Active Learning
Popular active learning algorithms such as efficient global optimization (Jones et al., 1998), famously known as Bayesian optimization, have been merged with surrogate modeling to estimate complex model parameters. In Bayesian optimization, a GP surrogate is built to approximate the objective function of the optimization, using a small number of sampling to query the expensive objective function where the samples are selected based on an acquisition function. In many areas such as nuclear physics (Ekström et al., 2019), material science (Ueno et al., 2016), and many more (Khosravi et al., 2019;Vargas-Hernández et al., 2019;Duris et al., 2020), Bayesian optimization is applied to estimate complex model parameters. However, all these techniques are focused on deterministic optimization to find a single optimal parameter value that best fits the simulation output to measurement data without considering the associated uncertainty.

Parameter Estimation in Personalized Models
In the specific area of estimating parameters of patient-specific models, existing studies can be classified into deterministic or probabilistic approaches. There are many optimization methods developed in the past few decades. Derivative free methods, such as the Subplex method (Wong et al., 2015), Bound Optimization BY Quadratic Approximation (BOBYQA) (Wong et al., 2012), New Unconstrained Optimization Algorithm (NEWUOA) , and hybrid particle swarm method (Mineroff et al., 2019), have been used in estimating cardiac model parameters. Derivative-based variational data assimilation approaches have also been applied to estimate cardiac conductivities in ventricular tissue (Yang and Veneziani, 2015;Barone et al., 2020b) and heterogeneous elastic material properties in personalized cardiac mechanic model (Balaban et al., 2018). Due to the computational expense associated with the model simulation during optimization, model reduction techniques such as Proper Generalized Decomposition (PGD) have been used to accelerate the estimation of cardiac conductivities in personalized cardiac electrical dynamics (Barone et al., 2020a). These methods overall are focused on finding a single value of cardiac model parameters that best fit the available data, lacking any uncertainty measure associated with the parameters.
On the other hand, limited progress has been made in the probabilistic estimation of personalized model parameters where the uncertainty measure can be derived from their posterior pdf. To reduce the extensive computation associated with standard MCMC sampling, various approaches of reduced modeling have been used to reduce the cost of forward simulation and thereby accelerate the inverse estimation (Lassila et al., 2013). Recent research reports building surrogate models using methods like kriging (Schiavazzi et al., 2016) and polynomial chaos (Konukoglu et al., 2011) to estimate cardiac model parameters. In Paun et al. (2019), GP emulation is used to speed up the MCMC process in the area of cardiovascular fluid dynamics. Probabilistic surrogate modeling through GP using Bayesian history matching is applied in Longobardi et al. (2020) for inference of cardiac contraction mechanics. In Neumann et al. (2014), polynomial chaos method is used to build the surrogate model for fast sampling to estimate parameters of an electromechanical model of the heart. However, with the limited accuracy in the approximated posterior pdf, directly sampling the surrogate results in improved efficacy but reduced accuracy. In Dhamala et al. (2018a), GP surrogate model of the posterior pdf of cardiac model parameters is built to accelerate MCMC sampling of the original posterior pdf. While this strategy avoids the loss of accuracy from sampling the surrogate pdf, it achieves a limited gain of efficiency due to the reliance on MCMC sampling of the original pdf.

Estimating High-Dimensional Parameters
High dimensionality is a bottleneck in estimating parameters, especially in cardiac physiology. Researchers mostly try to explain useful functions through dimension reduction in the original HD parameters. For example, in Malatos et al. (2016), it is shown that a lower-dimensional model can be useful in explaining blood flow. In Caruel et al. (2014), to explain cardiac function, LD muscle samples or myocytes as model parameters are estimated from HD ones. Estimating local myocardial infarct uncertainties through reducing the dimension of deformation patterns is introduced in Duchateau et al. (2016). In Giffard-Roisin et al. (2018), offline learning from electrocardiographic imaging (ECGI) is achieved through dimension reduction in the myocardial shape. As most of the parameters stay on manifold rather than Euclidean space, in Nakarmi et al. (2017), a kernelbased framework using LD manifold models to reconstruct cardiac dynamic MR images is proposed. In Lê et al. (2016), to reduce dimension, homogeneous tissue excitability (in the form of a model parameter) is represented by a single global model parameter. In Wong et al. (2015), and the cardiac mesh is pre-divided into 3-26 segments, each represented by a uniform parameter value. As the number of segments increases, the estimation becomes more challenging and increasingly reliant on initialization. Alternatively, a multi-scale hierarchy of the cardiac mesh is defined for a coarse-to-fine optimization, which allowed spatially adaptive resolution that was higher in certain regions than the other (Chinchapatnam et al., 2008;Dhamala et al., 2016). However, the representation ability of the final partition is limited by the inflexibility of the multi-scale hierarchy: Homogeneous regions distributed across different scales can-not be grouped into the same partition, while the resolution of heterogeneous regions can be limited by the level of scale the optimization can reach (Dhamala et al., 2017a). In addition, because these methods involve a cascade of optimization along the hierarchy of the cardiac mesh, they are computationally expensive.
In our recent work, we present an approach that replaces the explicit anatomy-based reduction in the parameter space with an implicit LD (LD) manifold that represents the generative code for HD spatially varying tissue excitability (Dhamala et al., 2018b). This is achieved by embedding within the optimization a generative model, in the form of a variational autoencoder (VAE) trained from a large set of spatially varying tissue excitability. In our previous work, we demonstrated the efficacy of this approach for deterministic optimization of spatially varying tissue excitability in cardiac electrophysiological models (Dhamala et al., 2018b). In this study, we leverage this strategy to enable probabilistic estimation of HD model parameters.

Bi-Ventricular Electrophysiology Model
There are many computational models with varying levels of biophysical details (Aliev and Panfilov, 1996;Mitchell and Schaeffer, 2003;Clayton et al., 2011). Among these, phenomenological models like the Aliev Panfilov (AP) model (Aliev and Panfilov, 1996) is capable of reproducing the key macroscopic process of cardiac excitation with a small number of model parameters. To test the feasibility of the presented method, we utilize the two-variable AP model given below: (1) Here, u ∈ [0, 1] is the transmembrane potential and v is the recovery current. The parameter ε = e 0 +(µ 1 v)/(u+µ 2 ) controls the coupling between u and v, and c controls the re-polarization. D is diffusion tensor, which controls the spatial propagation of u. θ is tissue excitability parameter that controls the temporal dynamics of u and v. Based on previous sensitivity analysis (Dhamala et al., 2017a), in this study, we focus on estimating parameter θ of the AP model (Equation 1), while fixing the values for the rest of the model parameters based on the literature (Aliev and Panfilov, 1996): c = 8, e 0 = 0.002, µ 1 = 0.2, and µ 2 = 0.3. We solve the AP model (Equation 1) on the discrete 3D myocardium using the meshfree method described in Wang et al. (2009). Then, we obtain a 3D electrophysiological model of the heart that describes the spatio-temporal propagation of 3D transmembrane potential u(t, θ θ θ). Note that, compared to existing works where the model parameter to be estimated is often assumed to be global or LD based on a pre-defined anatomical division of the heart, we consider the estimation of a HD parameter θ θ θ at the resolution of the cardiac mesh.
In this study, we demonstrate the presented framework using body surface electrocardiogram (ECG) that are generated by spatio-temporal cardiac action potential following the quasistatic approximation of the electromagnetic theory (Plonsey, 2001). In Wang et al. (2009), this relationship is modeled by solving a Poisson's equation within the heart and Laplace's equation external to the heart on a discrete mesh of the heart and the torso, which gives a linear model: where Y b (t) represents ECG data, u(t, θ θ θ) represents transmembrane potential, H b is the transfer matrix unique to patient-specific heart and torso geometry, and θ is the vector of tissue excitability to be estimated.

METHODOLOGY
The electrophysiological system as defined in section 3 defines a stochastic relationship between measurement data Y and model parameter θ as: where M is a composite of the whole-heart electrophysiological model and measurement model reviewed in section 3. ε is the noise term that accounts for measurement errors and modeling errors other than that arising from the value of the parameter θ . Assuming uncorrelated Gaussian noise ε ∼ N(0, σ 2 e I), the likelihood can be written as: The unnormalized posterior density of the model parameter θ has the following form, using Bayes rule: where π(θ ) provides us prior knowledge about the parameters. In this study, a uniform distribution bounded within [0, 0.5] is used where the bound is informed by the physiological values of parameter θ . In this general setup, our goal is to estimate the pdf function in Equation (5), which has an expensive likelihood function and a HD parameter θ . Naive MCMC sampling of Equation (5) would render intensive, if not infeasible, computation. Here, we cast the problem of estimating the function of π(θ |Y) into a Bayesian active learning problem: We aim to learn a GP approximation of the function π(θ |Y) from training samples of {θ (i) , π(θ (i) |Y)} l i=1 ; because the evaluation of π(θ (i) |Y) involves expensive computation, i.e., an expensive labeling process, we intelligently select a small number of training points θ (i) on which to query the label of π(θ (i) |Y). To achieve this, we bring two innovations to existing Bayesian active learning methods. First, leveraging our previous work (Dhamala et al., 2017a), we integrate generative modeling of HD θ into Bayesian active learning to embed the process of active search of training samples into a LD manifold. Second, we introduce new acquisition functions for selecting training points θ (i) , such that it focus on the shape of the posterior pdf of interest.

Enabling High-Dimensional Bayesian Active Learning via Generative Modeling
To obtain a generative model of θ = g(z), we use VAE that consists of two modules: a probabilistic deep encoder network with network parameters α α α that approximates the intractable true posterior density p(z|θ θ θ ) as q α α α (z|θ θ θ ) and a probabilistic deep decoder network with network parameters β β β that reconstructs θ θ θ given z with the likelihood p β β β (θ θ θ |z). Given a training data set = {θ θ θ (i) } N i=1 that consists of N different spatial distributions of the tissue excitability θ θ θ , VAE training involves optimizing the variational lower bound on the marginal likelihood of each training data θ θ θ (i) with respect to network parameters α α α and β β β: (6) We assume the prior p(z) ∼ N (0, 1) to be a standard Gaussian density. The optimization of Equation (6) with respect to α and β is achieved with stochastic gradient descent with reparameterization trick (Kingma and Welling, 2013). After the VAE is trained, the decoder as a generative model can be incorporated into Equation (5) to obtain: (7) where θ is now approximated by the expectation of the generative model p β β β (θ θ θ|z), and the prior of z is assumed to be Gaussian: π(z) ∼ N (0, 1). In another word, the use of p β β β (θ θ θ |z) allows us to now perform Bayesian active learning over the LD latent space z.

Bayesian Active Learning With Posterior-Focused Acquisition Functions
We aim to learn a GP approximation of the log posterior because, compared to the posterior pdf in Equation (7), and it has longer scales and lower dynamic range. In other words, we build a GP to approximate: Bayesian active learning with GP consists of an iterative process.
In each iteration, we 1) first select new training samples via the optimization of an acquisition function and 2) then obtain the posterior distribution of the GP from the prior distribution using newly obtained training samples. For the prior of the GP at the first iteration, we adopt the commonly used zero-mean function due to lack of prior knowledge and the anisotropic "Matérn 5/2" covariance function (Rasmussen, 2003): , is a diagonal matrix in which each diagonal element represents the inverse of the squared characteristics length scale along each dimensions of z, and α 2 is the function amplitude.

Acquisition Function Design
A crucial part of Bayesian active learning is to guide the algorithm about where to sample next, achieved by designing an acquisition function that balances between exploiting what is already learned about the target function of interest and exploring the unknown region of the input space. Existing GP-based Bayesian active learning is typically used for finding the optimum of a target function, using the mean and variance function of the GP approximations of the target function to exploit high-mean regions while exploring high-variance regions. In learning to approximate the posterior pdf function as defined in Equation (7), our goal differs from standard approaches in two ways. First, while we choose to build the GP approximation of the log posterior, we are interested in the accuracy of the posterior pdf function itself as our target function. Second, we are interested in the shape of the posterior pdf, rather than any single optimum value. These motivate the design of new acquisition functions as follows.
First, based on Equations (7) and (8), our posterior pdf of interest is an exponential factor away from the function being approximated by the GP. Since GP(z) at every z follows a Gaussian distribution, exp(GP(z)) follows log-normal distribution at every z. In other words, the function of exp(GP(z)) follows a log-normal process. To focus on the accuracy of approximating the posterior pdf function, rather than using the mean and variance of the GP to guide acquisition as in regular Bayesian active learning, we will use the mean and variance of the log-normal process exp(GP(z)) to guide acquisition.
Second, to focus more on learning the shape rather than optimum (i.e., mode) of the posterior pdf, we emphasize more on reducing the uncertainty of the learned exp(GP(z)) (i.e., exploration) than exploiting around its mode. Two natural candidates for measuring the uncertainty in the approximated exp(GP(z)) include the following: 1) variance of exp(GP(z)), and 2) entropy of exp(GP(z)) at any given z: At the i-th iteration of active learning, we select a single point of z (i) that maximizes (Equations 10 or 11) to update the GP.

Updating GP With New Training Samples
Once a new sample point z (i) is selected, the value of the log posterior in Equation (8) is evaluated at z (i) as L L L (i) , which includes the execution of the trained VAE decoder, the biventricular electrophysiological model, and the measurement model as described in section 3. The new input-output pair is used to update the posterior belief of the GP. Following (Williams and Rasmussen, 2006), the predictive mean and variance of the updated GP can be evaluated at any z: where k is the kernel function. We update the kernel hyperparameters, including the length-scale and noise variance mentioned in Equation (9), every time we add a new training point by maximizing the log of the marginal likelihood. Overall, the active learning process involves two steps: 1) adding new training points by maximizing the acquisition function, and 2) updating the GP posterior mean and variance function. This iterative process continues until the Kullback-Leibler (KL) divergence between the most updated predictive mean pdf function and the average of the last five predictive mean pdf functions of exp(GP(z)) does not exceed a predefined threshold. The length-scale and noise variance of kernel function are optimized every time by maximizing log of the marginal likelihood.

Generative Modeling of Spatially-Varying Tissue Excitability
Tissue excitability of whole heart from real data is not readily available. Cardiac images such as contrast-enhanced MRI may provide a surrogate for delineating different levels of myocardial injury, yet they are expensive to obtain at a large quantity. In this study, we utilized synthetic data sets = θ θ θ (i) N i=1 to train the VAE. Specifically, we generated a large data set of heterogeneous myocardial injury by random region growing. Starting with one injured node, one out of the five nearest neighbors of the present set of injured nodes was randomly added as an injured node. This was repeated until an injury of desired size was attained. We considered binary tissue types in the training data, in which the value of tissue excitability θ was set to be 0.5 or 0.15 for injured or healthy nodes, respectively, along with a random noise drawn from a uniform distribution [0, 0.001].
The VAE architecture used in the following experiments is shown in Figure 1. Each of the encoder and decoder network consisted of three fully connected layers with softplus activation, two layers of 512 hidden units, and a pair of two-dimensional units for the mean and log-variance of the latent code z. We trained the VAE with the Adam optimizer with an initial learning rate of 0.001 (Kingma and Welling, 2013). Figures 2A,B shows the scattered plots of the twodimensional latent codes z encoded by the VAE on the training data, color-coded by the size and location of the abnormal tissue. It appears that the latent code accounted for the size of the abnormal tissue along the radial direction (A), while clustering by the location of the abnormal tissue as well (B). This shows the ability of the generative model in capturing meaningful semantic information in the HD data in an unsupervised manner.

Synthetic Data Experiments
Synthetic experiments were carried out on three CT derived human heart-torso models. For ground truth of the tissue excitability, we divided the left ventricle (LV) into 17 segments based on the standard recommended by the American Heart Association (AHA). The region of abnormal tissue was then set as various combinations of these 17 LV segments. The value of θ in the abnormal region was set to 0.40, 0.45, or 0.50 to have different severity levels, and its value in the healthy region was set to 0.15.  A random noise drawn from a uniform distribution [0, 0.001] was added. Note that the tissue excitability in this test set is different from those in the training set, as described in section 5.1, in two aspects: 1) parameter values within the abnormal region and 2) shape and size of the abnormal region.
For each tissue excitability to be tested, body-surface measurements were simulated using the models described in section 3. A 20dB noise was then added to the measurement data for posterior estimation of parameter θ. To test the ability of the trained VAE model to be applied to hearts different from that used in training, for experiments on heart ♯1 and ♯2, the VAE was trained on heart ♯3; for experiments on heart ♯3, the VAE was trained on heart ♯1. The convergence criteria for each estimation followed that as defined in section 4.2.2.

Accuracy and Efficiency in Estimating Posterior pdf Function
We first evaluated the accuracy and efficiency of the presented method against 1) directly sampling GP approximation of the posterior pdf based on regular Bayesian active learning and 2) surrogate-accelerated two-stage MCMC sampling as presented in our previous work (Dhamala et al., 2017b), all against the baseline of directly sampling the exact posterior pdf using the standard MCMC. We considered 15 synthetic cases in total. All MCMC sampling were run on two parallel MCMC chains of length 10,000 with a common Gaussian proposal distribution with two different initial points. The variance of the Gaussian proposal distribution was tuned by rapidly sampling the GP surrogate pdf until obtaining an acceptance rate of 0.22, which is documented to enable good mixing and faster convergence in higher dimensional problems (Gilks et al., 1995;Andrieu et al., 2003). After discarding 20% initial burn-in samples and selecting alternate samples to avoid autocorrelation in each chain, the samples from two chains were combined. The convergence of all the MCMC chains was tested using trace plots, Geweke statistics, and Gelman-Rubin statistics (Gilks et al., 1995;Andrieu et al., 2003). The accuracy of estimated pdf in z space was evaluated through comparing the mean, mode, and standard deviation from the kernel density estimation of samples selected from our method and with other existing methods. Let s M be the estimated mean, mode, or standard deviation of the posterior pdf of z using direct MCMC sampling and s o be the corresponding statistics estimated from the three methods presented in Table 1. We used the mean and standard deviation of |s M − s o | calculated from 15 synthetic cases to evaluate the accuracy of all the comparison methods in estimating the mean, mode, and standard deviation of the posterior pdf in comparison to the direct MCMC sampling. The last column of Table 1 also shows the KL divergence between the estimated pdf from different methods with that from exact MCMC, obtained by sampling as described in Hershey and Olsen (2007). As shown, the accuracy of the estimated posterior pdf was significantly higher than that obtained by regular Bayesian active learning (paired t-test on estimated parameters from 15 cases, p < 0.001). While its accuracy was still lower than the surrogateaccelerated two-stage MCMC, it used only 0.6% computation (in terms of the number of model simulations needed) of the two-stage MCMC method. As detailed in Figure 3B, while the two-stage MCMC achieved ∼ 40% reduction in the number of model simulations needed compared to the direct sampling of the exact posterior pdf, the presented method reached a ∼ 99.65% reduction in computation. Figure 3A gives examples of the posterior pdfs estimated from different methods in comparison to that obtained from direct sampling.
As shown, the presented method (green curve) closely reproduced the true posterior pdf (red curve) obtained from direct MCMC, while the function learned by the standard Bayesian active learning (black curve) fell short in as closely reproducing the posterior pdf.

Accuracy and Uncertainty in the Estimated Tissue Excitability
From the estimated posterior pdf of π(z|Y) over the latent LD manifold, we obtained the posterior pdf of π(θ|Y) over the spatial space of the heart. We estimated the mean, mode, and standard deviation in HD space through inserting MCMC samples of z taken from posterior π(z|Y) to the expectation network of the trained VAE decoder.
For accuracy of the estimated tissue excitability, we considered the mean and mode from the estimated posterior pdf of π(θ |Y) and evaluated against the ground truth tissue excitability using three metrics: dice coefficient (DC), root mean square error (RMSE), and correlation coefficient (CC). As shown in Figure 4, for DC, the mean and mode from the presented method were more accurate than those obtained by regular Bayesian active learning (paired t-test, p < 0.001 for mean and p < 0.05 for mode) but less accurate than those obtained from the two-stage MCMC (paired t-test, p < 0.10 for mean and p < 0.001 for mode). For RMSE, similarly, mean and mode both were more accurate from regular active learning method (paired t-test, p < 0.005 for mean and p < 0.05 for mode). In comparison with the two-stage MCMC, there was no difference for mean and mode with the presented method (paired t-test, insignificant at 20% level of significance). For CC, our presented method showed similar accuracy with the two-stage MCMC and regular active learning method for mean estimation. But for CC from mode estimation, our method showed higher accuracy than the regular method (paired t-test, p < 0.01) but less accuracy than the two-stage MCMC (paired t-test, p < 0.05). Figure 5A provides a visual example of the estimated spatially varying tissue property on the heart, corresponding to the LD posterior pdf shown in the left column of Figure 3A. First, as shown in Figure 5B, the estimated mean provided by the presented method corrected a false positive in the solution from regular Bayesian active learning (row one). The high uncertainty in this region from the regular Bayesian active learning was also corrected by the presented method (row three). Second, as noted in the left column of Figure 3A, the underlying LD posterior pdf is uni-modal, where both the presented method and two-stage MCMC correctly recovered the mode in comparison to regular Bayesian active learning. Similarly, the resulting mode in the HD space of the tissue property was correctly located in position FIGURE 6 | Illustrations of training points (blue dots) selected using variance based on the log-normal process (left), entropy based on the log-normal process (middle), and upper confidence bound (UCB) based on the Gaussian process (right).
in the presented method whereas the mode of regular Bayesian active learning shifted in accordance with low dimensional shift. This shows a correct one-to-one mapping of LD to HD generative process. Finally, as noted earlier, while the two-stage MCMC, in general, delivered higher accuracy, this performance gain was achieved with over 167-fold increase in computation.

Exploration vs. Exploitation Using Log-Normal Process Based Acquisition Functions
To understand the advantage of the presented log-normal process-based acquisition functions, we examined where the active selection of training samples took place in the presented method vs. regular Bayesian active learning. Figure 6 left and middle shows the acquisition of training samples using the variance and entropy of the log-normal process, using, respectively, 100 and 108 sampling points to meet the convergence criteria. The contour plot inside these figures showed the shape of the true bivariate posterior pdf. In comparison, Figure 6 right panel shows training samples selected based on the GP using upper confidence bound (UCB). To converge, it took 129 acquisition steps, which were higher than those used in the presented method. Comparing left and middle panel, it showed that the regular acquisition, while exploited the mode of the posterior mode, explored without focusing on the posterior support. In comparison, the presented acquisition functions effectively both exploited and explored within the posterior support.

Experimental Data and Data Processing
In this section, we increased the difficulty of active posterior estimation by: 1) considering hearts with realistic tissue excitability extracted from contrast-enhanced MRI (CE-MRI) and 2) simulation data of 3D cardiac electrical activity generated by a high-fidelity biophysics model blinded to the AP model used in the active posterior estimation. In comparison to synthetic data considered in section 5.2, these image-derived tissue excitability had the following characteristics that increased its heterogeneity: the presence of 1) both dense infarct core and gray zone, 2) a single or multiple infarcts with complex spatial distribution and irregular boundaries, and 3) both transmural and non-transmural infarcts.
We considered six post-infarction human hearts. The patientspecific ventricular models along with the detailed 3D infarct architectures were delineated from MRI images as detailed in Arevalo et al. (2016). The training of VAE was performed on one of the hearts described in section 5.1, using synthetically generated tissue excitability values as described in that section. Figure 7 summarizes the results of estimated tissue excitability on the six post-infarction hearts. Overall, estimated tissue property, especially the estimated mode, was close to the ground truth. One more source of increased difficulty in this set of experiments, in comparison to those in synthetic data, was the presence of non-transmural scar tissue that did not exist in the training data of the VAE. This difficultly in estimating has been previously reported in literature (Dhamala et al., 2017a). As shown in Figure 7 cases 1-3 and 5 (second and third rows), the estimated mean or mode either missed the region of nontransmural abnormal tissue property or incorrectly estimated it to be transmural (case 3-mode). The associated uncertainty was not captured in the estimated standard deviation (Figure 7 fourth row) either. Another source of difficulty is the presence of diffused heterogeneous abnormal tissue that was not considered in the VAE training data. For instance, in case 4 and case 6, there was a large patchy gray zone mixed within the dense scars. These regions were reflected in the region of estimated abnormal tissue excitability; however, the estimated parameter values were not able to distinguish between the gray zone and dense infarct. In addition to identifiability issues associated with the presented method and the available data, this performance may also arise from the fact that the AP model considered has limited ability in differentiating electrical behavior from gray zone and infarct core (Ramírez et al., 2020).

Experiments on in vivo ECG and Voltage Mapping Data
Finally, we performed active posterior estimation for tissue excitability in real data experiments of three patients who went catheter ablation of ventricular tachycardia due to myocardial infarction (Sapp et al., 2012). The patient-specific geometrical models of the heart and torso were constructed from axial CT images detailed in Wang et al. (2016). In vivo measurements of FIGURE 7 | Results of estimated tissue excitability from the presented method in 3D infarcts delineated from in vivo MRI images. Regions with low excitability (high θ θ θ values) correspond to infarct regions (0.5 = infarct core, 0.3-0.5 = gray zone). The red circles highlight non-transmural scars or gray zone. 120-lead ECG were collected during pacing from known sites of each heart. The surrogate used for evaluating the estimated tissue excitability was in vivo bipolar voltage data collected by catheter mapping. As illustrated in Figure 8, based on the voltage data, the myocardium tissue can be divided into three groups: infarct core (red: bipolar voltage < 0.5 mv), infarct border (green: bipolar voltage 0.5-1.5 mv), and healthy (blue: bipolar voltage > 1.5 mv). Among the three patients, we consider 120-lead ECG data collected from a total of six different pacing sites. 1) Case 1: In this case, we were able to estimate the posterior pdf of tissue excitability by combining ECG data from two different pacing locations. As shown in Figure 8A (first row), this subject had a small infarct in the lateral-basal area of LV. The presented method was able to capture the location of this infarct core, although much more smoothed out in comparison to the voltage data as illustrated in Figure 8B, first row). The estimated pdf also exhibited uncertainty higher than the rest of the myocardium in this location. These results were obtained by 129 active acquisitions of simulations with the presented method.
Interestingly, when estimating the posterior pdf using only data from one pacing location, the mode of the estimated pdf was incorrectly shifted from the actual location of the infarct tissue-and the uncertainty at that location correspondingly became higher compared to that associated with estimation using multiple ECG data ( Figure 8C, first row).
2) Case 2: In this case, we were able to estimate the posterior pdf of tissue excitability by combining ECG data from three different pacing locations. As illustrated in Figure 8A (second row), this subject had a highly heterogeneous infarct in the lateral region of the LV. The presented method, using 153 active acquisitions of simulations, was able to recover the correct location of the infarct, with an attempt to recover the heterogeneity in the tissue excitability ( Figure 8B, second row). The mode solution was also shifted from the target region. The heterogeneity, however, was not captured in fine detail, likely due to the lack of such heterogeneous data in the VAE training. The associated uncertainty of the solution was accordingly high. When reducing the measurement data to only ECG data from one pacing site, the estimated solution is almost similar when we used three pacing sites.
3) Case 3: In this case, we only had access to one-paced ECG data for estimating the posterior pdf of tissue excitability. As illustrated in Figure 8A (third row), this case had a relatively dense scar in inferolateral LV with only one set of measurement data. The presented method was able to locate the infarct using 147 active acquisitions of simulations, with an uncertainty lower than that of the previous two cases (Figure 8C, third row).

LIMITATIONS AND FUTURE WORKS
In this study, we demonstrated the feasibility of Bayesian active learning for fast approximation of posterior pdf involving heavy simulations. Our key innovation was to modify the acquisition functions in regular Bayesian active learning, such as to focus more on approximating the shape of the posterior pdf of interest rather than finding the mode of the pdf when using regular acquisition functions. Following this idea, in this study, we demonstrated the feasibility of guiding acquisition with the variance or entropy of the log-normal process being learned. Future work will continue to explore this idea in other acquisition functions, with a goal to modulate the trade-off between exploitation and exploration over the space of z based on the prior knowledge of its distribution. One possible example is to consider the improvements in the KL divergence between the actual and approximated posterior pdf.
While the parameter θ θ θ was represented in Euclidean space in this study, organ tissue excitability is actually defined over a physical domain in the form of a 3D geometrical mesh. By representing this non-Euclidean data in a Euclidean space, we have ignored the 3D spatial structure of the physical mesh.
A future step would be to construct the generative model in non-Euclidean space by considering the geometrical mesh as a graph (Dhamala et al., 2019). We fixed other parameters values in the electrophysiological model in Equation (1) to estimate θ θ θ , while a better strategy could be varying all the parameters through respective distributions (Niederer et al., 2020). As a feasibility study, we considered a scalar parameter per cardiac mesh node; this simplifies the problem, although the parameter space was still HD since the parameter values change across space. Future studies should consider diffusion tensor D, which requires considering fiber directions that are largely approximated and associated with errors. The lack of real data of organ tissue excitability is the main challenge for training the generative model. A natural next step is to investigate the possibility of using accessible tissue excitability data derived from in vivo and ex vivo optical mapping (Gizzi et al., 2013;Kappadan et al., 2020;Uzelac et al., 2021). In this study, the VAE was trained by synthetic data only, that is simplified in shape, transmurality, and heterogeneity. It thus may have a limited ability to generalize to realistic conditions where tissue abnormality is more complex in these aspects. An important direction of future work is to investigate means to improve the training data for the generative model.
While the VAE provides a probabilistic generative model p β (θ θ θ |z), we only adopted the expectation network of this probabilistic model, E[p β (θ θ θ |z)], as the generative model to achieve the HD-to-LD embedding of the optimization objective. An immediate next step is to investigate the incorporation of the uncertainty in the generative model into both the active learning of π(z|Y) and the estimated pdf π(θ|Y).
Finally, this study focuses on the specific component of tissue excitability estimation within the much bigger pipeline of personalized cardiac modeling. We thus focused on validating the estimated tissue excitability using synthetic and in vivo imaging and mapping data. A next step will be to evaluate the personalized model in predictive tasks, such as predicting the risk (Arevalo et al., 2016) or the optimal treatment target (Trayanova et al., 2018) for lethal ventricular arrhythmia, and investigate how the uncertainty propagates to simulation outputs and may impact clinical decisions.

CONCLUSIONS
In this study, we present a novel framework for fast approximation of the posterior pdf of HD simulation parameters through intelligently selecting training points. This is achieved by casting posterior inference into the setting of Bayesian active learning, integrated with 1) generative modeling to allow active search over HD parameter space and 2) novel acquisition functions to focus on the shape rather than modes of the posterior pdf. Future work will investigate the design of additional acquisition functions, the incorporation of the uncertainty in the generative model, and the extension of the presented methodology to probabilistic estimation in other complex simulation models.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
MZ as corresponding author did experiments, designed, and wrote the whole article. JD helped initializing the idea of this article. PB helped doing experiments. JS, BH, KW, and NT helped providing experimental data. LW gave direction in all sections (idea, experiments, and writing) to finalize this article. All authors contributed to the article and approved the submitted version.

FUNDING
This work was supported by the National Heart, Lung, and Blood Institute (NHLBI) of the National Institutes of Health (NIH) under Award No: R15HL140500 and R01HL145590.