Uncertainty Quantification Reveals the Importance of Data Variability and Experimental Design Considerations for in Silico Proarrhythmia Risk Assessment

The Comprehensive in vitro Proarrhythmia Assay (CiPA) is a global initiative intended to improve drug proarrhythmia risk assessment using a new paradigm of mechanistic assays. Under the CiPA paradigm, the relative risk of drug-induced Torsade de Pointes (TdP) is assessed using an in silico model of the human ventricular action potential (AP) that integrates in vitro pharmacology data from multiple ion channels. Thus, modeling predictions of cardiac risk liability will depend critically on the variability in pharmacology data, and uncertainty quantification (UQ) must comprise an essential component of the in silico assay. This study explores UQ methods that may be incorporated into the CiPA framework. Recently, we proposed a promising in silico TdP risk metric (qNet), which is derived from AP simulations and allows separation of a set of CiPA training compounds into Low, Intermediate, and High TdP risk categories. The purpose of this study was to use UQ to evaluate the robustness of TdP risk separation by qNet. Uncertainty in the model parameters used to describe drug binding and ionic current block was estimated using the non-parametric bootstrap method and a Bayesian inference approach. Uncertainty was then propagated through AP simulations to quantify uncertainty in qNet for each drug. UQ revealed lower uncertainty and more accurate TdP risk stratification by qNet when simulations were run at concentrations below 5× the maximum therapeutic exposure (Cmax). However, when drug effects were extrapolated above 10× Cmax, UQ showed that qNet could no longer clearly separate drugs by TdP risk. This was because for most of the pharmacology data, the amount of current block measured was <60%, preventing reliable estimation of IC50-values. The results of this study demonstrate that the accuracy of TdP risk prediction depends both on the intrinsic variability in ion channel pharmacology data as well as on experimental design considerations that preclude an accurate determination of drug IC50-values in vitro. Thus, we demonstrate that UQ provides valuable information about in silico modeling predictions that can inform future proarrhythmic risk evaluation of drugs under the CiPA paradigm.

The Comprehensive in vitro Proarrhythmia Assay (CiPA) is a global initiative intended to improve drug proarrhythmia risk assessment using a new paradigm of mechanistic assays. Under the CiPA paradigm, the relative risk of drug-induced Torsade de Pointes (TdP) is assessed using an in silico model of the human ventricular action potential (AP) that integrates in vitro pharmacology data from multiple ion channels. Thus, modeling predictions of cardiac risk liability will depend critically on the variability in pharmacology data, and uncertainty quantification (UQ) must comprise an essential component of the in silico assay. This study explores UQ methods that may be incorporated into the CiPA framework. Recently, we proposed a promising in silico TdP risk metric (qNet), which is derived from AP simulations and allows separation of a set of CiPA training compounds into Low, Intermediate, and High TdP risk categories. The purpose of this study was to use UQ to evaluate the robustness of TdP risk separation by qNet. Uncertainty in the model parameters used to describe drug binding and ionic current block was estimated using the non-parametric bootstrap method and a Bayesian inference approach. Uncertainty was then propagated through AP simulations to quantify uncertainty in qNet for each drug. UQ revealed lower uncertainty and more accurate TdP risk stratification by qNet when simulations were run at concentrations below 5× the maximum therapeutic exposure (C max ). However, when drug effects were extrapolated above 10× C max , UQ showed that qNet could no longer clearly separate drugs by TdP risk. This was because for most of the pharmacology data, the amount of current block measured was <60%, preventing reliable estimation of IC 50 -values. The results of this study demonstrate that the accuracy of TdP risk prediction depends both on the intrinsic variability in ion channel pharmacology data as well as on experimental design considerations that preclude an accurate determination of drug IC 50 -values in vitro. Thus, we demonstrate that UQ provides valuable information about in silico modeling predictions that can inform future proarrhythmic risk evaluation of drugs under the CiPA paradigm.

INTRODUCTION
Drugs that block cardiac ion channels encoded by the humanether-à-go-go Related Gene (hERG) and consequently prolong the QT interval are associated with increased risk of Torsade de Pointes (TdP), a potentially lethal arrhythmia that caused several drugs to be withdrawn from market . In 2005, the International Council on Harmonisation (ICH) S7B and E14 guidelines were established to address the issue of TdP liability for new drugs. As stated in these guidelines, their intent was to be used as a screening method to identify drugs that would require more intensive electrocardiographic monitoring of patients in late phase (e.g., phase 3) clinical trials. However, hERG block or QT prolongation does not necessarily correlate with TdP risk, and as a result of these guidelines, many novel compounds are screened out of development because of detected hERG block or QT prolongation without further evaluation of actual TdP risk. Additional insight into TdP risk for hERG-blocking and QT-prolonging drugs can be determined by also assessing whether drugs block inward currents such as, L-type calcium or late sodium (Duff et al., 1987;January and Riddle, 1989;Chézalviel-Guilbert et al., 1995;Guo et al., 2007). The Comprehensive in vitro Proarrhythmia Assay (CiPA) is a global initiative to revise the current guidelines with a new set of mechanistic assays that improve the specificity of the proarrhythmia screening process .
The CiPA in silico assay will test new compounds for the potential to cause TdP by incorporating in vitro pharmacology data on multiple ion channels into a mathematical model of the cardiac action potential (AP). The AP model will be used to predict drug effects related to early afterdepolarizations (EADs), which are a known cellular trigger of TdP (Yan et al., 2001). Numerous studies have shown that when outward repolarizing currents such as, I Kr (the current carried by hERG-encoded channels) are blocked in cardiac cells, the resulting imbalance of inward and outward currents prolongs the AP and can, at extreme levels, lead to inward current reactivation and EADs (January and Moscucci, 1992). However, EADs may not occur if a drug also significantly blocks inward currents, leading to a balanced block scenario where the AP is prolonged but inward currents cannot reactivate (Antzelevitch et al., 2004). Because it is difficult to know how much inward vs. outward current block is safe, or how dynamic effects might impact EAD propensity, the purpose of the CiPA in silico model will be to assess the integrated effects of multiple ion channel block on TdP risk. As with any model built on inherently variable experimental data, however, confidence in model predictions will depend on the level of uncertainty in model inputs (here, the drug-specific parameters) and the corresponding uncertainty in model outputs (Pathmanathan et al., 2015;Johnstone et al., 2016b). In order for CiPA to provide useful guidance to the drug development and regulatory process, it will be necessary to incorporate uncertainty quantification (UQ) into modeling predictions (Pathmanathan and Gray, 2013;Mirams et al., 2016).
The CiPA in silico ventricular AP model and a mechanismbased metric for TdP risk stratification have been trained on a designated set of 12 CiPA compounds with known TdP risk levels (High, Intermediate, or Low, see Table 1). These compounds were selected and categorized by a team of expert clinicians, safety pharmacologists, and electrophysiologists based on adverse event data and published reports . The current CiPA AP model was developed through a series of modifications to the O'Hara-Rudy (ORd) human ventricular AP model (O'Hara et al., 2011). Li et al. (2016) first developed a Markov model of the hERG channel that included temperature-sensitive gating, which was subsequently modified to recapitulate I Kr from the original ORd model, with an added pharamacological component (Li et al., 2017). The hERG/I Kr model was then incorporated into the ORd AP model to produce the I Kr -dynamic ORd model. In the CiPAORdv1.0 model, we further optimized the I Kr -dynamic ORd model by scaling ionic current conductances to better reflect changes in AP duration observed in human ventricular myocytes when ionic currents were blocked (referred to as the optimized I Kr -dynamic ORd model in Dutta et al., 2017). With this model, we derived a new in silico biomarker for TdP risk, the qNet metric, which correlated well with in silico cell "distance" to EADs and thus provided a continuous marker for EAD susceptibility. Although we showed that the qNet metric could correctly stratify the 12 CiPA training drugs by known TdP risk, uncertainty in these modeling predictions was not evaluated.
In this study, methods for applying UQ to the CiPA in silico assay are presented. For the 12 CiPA training compounds, we examine the uncertainty in drug-specific kinetics parameters for drug binding and trapping in the I Kr -dynamic model. In addition, we examine uncertainty in dose-response curve IC 50 and Hill coefficients for the remaining six CiPA-selected ionic currents, as this can also be considerable (Elkins et al., 2013). We thereby characterize uncertainty in drug effects on ion channels due to variation in experiments, whatever the cause of this variation may be. We then sample from these probability distributions for the drug effects and run forward simulations to examine the subsequent uncertainty in qNet and TdP risk stratification.

Human Ventricular Action Potential Model
The CiPAORdv1.0 model (the optimized model from Dutta et al., 2017) was used for all simulations in this study, in order to evaluate TdP risk for the set of 12 CiPA training compounds listed in Table 1. Parameter values for the model are listed in Tables S1, S2.

Multiple Ion Channel Pharmacology
Pharmacological effects of the 12 CiPA training compounds on ionic currents were modeled as in Li et al. (2017) and Dutta et al. (2017). The kinetics of hERG block were modeled with the I Kr Markov model from Li et al. (2017), which was fit to voltage clamp data obtained at the U.S. Food and Drug Administration (FDA; parameters listed in Table 2). For six other ionic currents (L-type calcium, I CaL ; late sodium, I NaL ; fast sodium, I Na ; transient outward, I to ; slowly activating delayed rectifier, I Ks ; and inward rectifier, I K1 ), drug effects were represented by a simple pore blocking model in which maximal current conductances were reduced according to the Hill equation. Hill equation parameters (Table 3) were fit to data from Crumb et al. (2016). Some of the data have been updated since publication and are available online (see section Software and Data).

Numerical Methods and Data Analysis
Model equations were written in C and compiled for use with version 3.3 of the R programming language (R Core Team, 2016) and version 1.14 of the deSolve package . Equations were integrated using the lsoda solver with relative and absolute error tolerances of 10 −6 and other solver settings as default. For computationally intensive bootstrap simulations (see section Drug-hERG Binding Kinetics), a relative tolerance of 10 −3 was used. Data analysis was performed in R, and figures were produced with version 2.2.0 of the ggplot2 package (Wickham, 2009).

Simulation Protocol for TdP Risk Evaluation
The CiPAORdv1.0 model was used to simulate APs at a cycle length (CL) of 2 s (stimulus amplitude = −80 µA/µF, duration = 0.5 ms). The model was initialized from control (no drug) steady-state values (Table S3) and paced for 1,000 beats. Drugs were simulated at multiples of their maximum therapeutic concentrations (C max , Table S4), ranging from 1 to 10× C max (1× increments) and from 15 to 25× C max (5× increments). At each concentration, TdP risk was evaluated using the metric qNet, defined as the net charge carried by six major currents (I Kr , I CaL , I NaL , I to , I Ks , and I K1 ) over an entire beat . The qNet metric was computed by integrating the sum of the six currents from the start of the stimulus (t = 0 s) until the end of the beat (t = 2 s) using lsoda (see section Numerical Methods and Data Analysis). Analysis was performed only on the last 250 beats of the pacing protocol to allow drug effects to reach quasi-steady state for simulations with beat-to-beat instability. Beats in which transmembrane potential (V m ) failed to depolarize above 0 mV were excluded from analysis, and simulations in which every beat failed to depolarize were excluded from TdP risk evaluation. The maximum slope during repolarization (dV/dt repol ) was defined as the maximum change in V m (dV/dt) between 30 and 90% repolarization for beats that fully repolarized; as the maximum dV/dt between 30% repolarization and the end of the beat (t = 2 s) when V m repolarized by 30% but not 90%; or as the maximum dV/dt between the AP peak and the end of the beat when V m failed to repolarize by 30%. An EAD was defined to have occurred on any beat in which dV/dt repol was greater than zero. Out of the last 250 beats, the beat with the steepest reactivation of the membrane potential (maximum dV/dt repol ) was used to calculate qNet, whether or not an EAD had occurred.

Drug-hERG Binding Kinetics
In Li et al. (2017), time series measurements of the fractional hERG current in the presence of drug were obtained using a modified Milnes voltage clamp protocol (Milnes et al., 2010;Li et al., 2017). Because of the long duration of the protocol, each cell could only be tested at a single drug concentration, and the drug-hERG binding and trapping parameters (see Table 2) were fit to the fractional current traces measured during a voltage step to 0 mV, averaged across cells by concentration. Specifically, each dataset y consisted of a set of fractional current time series observations x c,i (t) (c = 1, 2, . . . , m, where m is number of the concentrations tested; i = 1, 2, . . . , n c , where n c is the number of cells tested at the cth concentration; and x c,i (t j ) were independent between concentrations). The mean drug response at the cth concentration wasx c (t) = 1 n c n c i=1 x c,i (t) (i.e., the average of fractional current traces across cells), and the overall mean responseȳ = x 1 (t) ,x 2 (t) , . . . ,x m (t) (i.e., the set of average fractional current traces at each concentration) was used to fit the optimal drug-hERG kinetics parameters (θ (ȳ)). Parameters were fitted using the Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES) (Hansen, 2006), with version 1.0-11 of the cmaes package (Trautmann et al., 2011). Details on the CMA-ES implementation can be found in the Supplemental Methods. Bounds for the dynamic drug-hERG binding parameters used to fit bootstrap samples can be found in Table S5. The non-parametric bootstrap method was used to characterize uncertainty in the fitted parameters (Efron and Tibshirani, 1986). Observations x * c,i (t) were randomly drawn with replacement from x c,i (t) to obtain a bootstrap sample y * b of the same size as the original dataset, with an identical number of observations per concentration. A total of 2,000 bootstrap samples (b = 1, 2, . . . , 2000) were generated using version 1.3-18 of the boot package (Davison and Hinkley, 1997;Canty and Ripley, 2016). The mean responseȳ * b for each bootstrap sample was then computed in the same manner asȳ and used to refit the drug-hERG kinetics parameters (θ(ȳ * b )), yielding a joint sampling distribution of drug-hERG parameters.

Dose-Response Curves
For other ionic currents, uncertainty in dose-response curves was characterized using a Bayesian inference approach. Version 1.3.5 of the FME package was used to fit Hill equation parameters and to characterize uncertainty, using Markov-chain Monte Carlo (MCMC) simulation with the delayed rejection and adaptive Metropolis algorithm . The percentage of ionic current block was assumed to be a normal random variable located at the Hill equation response curve with unknown variance σ 2 . Log-transformed IC 50 -values [pIC 50 = −log 10 (IC 50 /c 0 ), where c 0 = 10 9 nM] were bounded to the range [−1, 19] for fitting and MCMC simulation (bounding IC 50 -values between 10 −10 and 10 10 nM). Hill coefficients (h) were bounded to the range [0, 10]. Optimal IC 50 and Hill coefficient (h)-values were fit using non-linear least squares (see Table 3). The joint probability distribution of IC 50 and h was estimated using MCMC simulation. A uniform prior distribution was used for pIC 50 and h. The error variance σ 2 was considered a nuisance parameter and was sampled as conjugate priors from an inverse gamma distribution during MCMC simulation. The proposal distribution was multivariate normal. A total of 2,000 MCMC samples (pIC 50 , h) were saved for each drugcurrent combination to form a joint sampling distribution of Hill equation parameters (see Supplemental Methods for implementation details).

Uncertainty Propagation
Samples from the joint distribution of drug-hERG parameters and the joint distributions of Hill equation parameters for a  Li et al. (2017), so the amount of block was assumed to always be 0%.
Frontiers in Physiology | www.frontiersin.org particular drug were assumed to be independent and were combined in AP simulations to assess the uncertainty in qNet (see section Simulation Protocol for TdP Risk Evaluation). One sample from each distribution was selected in sequential order (e.g., the first sample from each distribution) to form a set of parameters that defined a single sample from the drug-effect probability distribution. This was repeated until all parameter samples were exhausted, generating a sampling distribution of 2,000 drug-effect samples per drug (referred to as uncertainty inputs), which. Each input was simulated with the CiPAORdv1.0 model to assess variability in AP model outputs (qNet, see section Simulation Protocol for TdP Risk Evaluation). Variability in qNet was quantified with 95% CIs. Sampling distributions of qNet were visualized with violin plots.

Cross Validation
Leave-one-out cross validation (LOOCV) (Hastie et al., 2009) was used to assess the accuracy of TdP risk stratification at each simulated concentration relative to C max . The CiPA Low, Intermediate, and High TdP risk levels ( Table 1) were given numerical category values of 0, 1, and 2, respectively. At each concentration (1−25× C max ), a classifier was trained on all samples from the qNet distributions of all but one of the training drugs. The classifier was based on proportional odds logistic regression using the lrm function from version 4.5-0 of the rms package (Harrell, 2016). The numerical tolerance was set to 10 −10 and the maximum number of iterations was set to 10 6 for fitting. Each sample of the remaining, "left out" drug was then assigned to the category with the highest probability based on logistic regression results. The predicted probability of each category [P(x), where x is 0, 1, or 2] for the "left out" drug was computed as the fraction of samples assigned to that category, and the prediction error for that drug was computed as the mean absolute difference between the assigned and actual TdP category over all samples. This procedure was repeated for all 12 training drugs, and the mean and standard deviation of prediction errors at each concentration were computed to evaluate overall TdP risk stratification performance.

Software and Data
The software and data used in this study are available at https:// github.com/FDA/CiPA.

Uncertainty in Drug-hERG Binding Kinetics
Bootstrapping was performed with voltage clamp data from Li et al. (2017) in order to estimate the joint probability distribution of fitted drug-hERG dynamic binding parameters. The 95% CIs of hERG binding parameters for the 12 CiPA training drugs (Table 1) are listed in Table 2. Parameter fitting results for bepridil are illustrated in Figure 1A. The rate of bepridil unbinding (K u ) had a relatively narrow 95% CI [10 −3.8713 , 10 −3.671 ms −1 ], indicating that this parameter was well-constrained by the experimental data and uncertainty in its value was low. In contrast, the pairwise scatter plot of log 10 (K max ) and log 10 (EC 50 n ) revealed a strong correlation between the two parameters, and their fitted ranges spanned several orders of magnitude. The pairwise scatter plots for other training drugs displayed similar correlations between log 10 (K max ) and log 10 (EC 50 n ) (panel A in Figures S1-S11). The large uncertainty in K max and EC 50 n did not produce a similar degree of variability in the kinetics of hERG block, however. In Figure 1B and panel B of Figures S1-S11, shaded areas indicate the 95% CI of the block predicted by parameters in Figure 1A and panel A of Figures S1-S11. The variability in hERG block was much more limited than the variability in K max or EC 50 n , which was not surprising because Li et al. (2017) showed that for most of the 12 training drugs, there was a near-linear relationship between drug concentration and binding rate, occurring when the fitted EC 50 -value was much greater than the maximum drug concentration tested. For example, the optimal EC 50 -value of bepridil was 10 8.7 nM, and the bootstrapestimated 95% CI was [10 7.0 , 10 9.7 ], but the maximum bepridil concentration tested was 300 nM, or roughly 10 2.5 nM. In such cases, the E max equation defining the sigmoidal dose-response relationship of drug binding [E max = K max * (D n /(D n +EC 50 n ))] was linearly approximated by E max ≈(K max /EC 50 n ) * D n , and the ratio K max /EC 50 n effectively becomes a single identifiable parameter. Thus, the 95% CIs for log 10 (K max /EC 50 n ) were much narrower than the 95% CIs for log 10 (K max ) and log 10 (EC 50 n ) ( Table 2). The E max equation was chosen to model drug binding because of its flexibility in accommodating both linear and sigmoidal dose-response relationships. As a result, for those compounds whose drug binding mode is actually linear, the ratio but not the individual values of the two correlated parameters were identifiable (Li et al., 2017).
In addition, multimodality (the presence of multiple peaks in the sampling distribution) was frequently observed in other hERG kinetics parameters (Figures S1-S11), in particular with V halftrap . In the hERG binding model, V halftrap is a drug trapping parameter that determines the steady-state fraction of openbound (untrapped) to close-bound (trapped) channels. Li et al. (2017) demonstrated that the High-and Low-risk CiPA training drugs could be separated by this single parameter (V halftrap > −65 mV for High-risk drugs, while V halftrap < −85 mV for Lowrisk drugs). The multimodality identified in V halftrap sampling distributions raised the question of whether this trend still holds under uncertainty analysis. As shown in Figure 2, the 95% CIs of V halftrap were quite wide for most drugs, but much of this variability covered ranges where the ratio of open-to close-bound channels (O bound /C bound ) at −80 mV was relatively flat, near 1 for Low-risk drugs (green bars) or near 0 for High-risk drugs (red bars). In the steepest region of the O bound /C bound curve, V halftrap distributions of High-vs. Low-risk drugs were wellseparated (upper credible bounds < −77 mV for Low-risk drugs, lower credible bounds > −75 mV for High-risk drugs). Thus, UQ identified consistently low or high levels of trapping for Low-vs. High-risk drugs, respectively, providing increased confidence in the V halftrap trend identified by Li et al. (2017). Note that with or without UQ, the V halftrap -values of Intermediate-risk drugs (blue bars and points) other than chlorpromazine were generally indistinguishable from Low-risk drugs, and chlorpromazine was indistinguishable from High-risk drugs, indicating that the  degree of drug trapping is not sufficient to stratify compounds into the three CiPA risk levels.

Uncertainty in Dose-Response Curves
Bayesian inference was used to estimate the joint probability distribution of Hill equation parameters characterizing steadystate I Na , I CaL , I NaL , I to , I Ks , and I K1 block by each of the 12 CiPA training drugs. MCMC simulation was not performed for drug-current combinations that did not have defined IC 50values in Li et al. (2017), which were assumed to have 0% block. Parameter fitting results are summarized in Table 3. Some MCMC simulations produced joint sampling distributions with a single well-defined peak, such as, that of ranolazine-I NaL ( Figure 3A). The mean parameter values of this distribution (pIC 50 = 5.0958, h = 0.9594) were close to the optimal fitted values (pIC 50 = 5.1033, h = 0.945), and the 95% CIs [pIC 50 (4.9859, 5.2079), h (0.7247, 1.256)] were relatively narrow, indicating that uncertainty in these parameters was low. Consequently, the variability in dose-response curves defined by these parameters was also low. At any given concentration, uncertainty in ranolazine-I NaL block (i.e., the width of its 95% CI) was <16% (Figure 3B, shaded area), reflecting the variability observed in experiments (circles). Note that uncertainty in ranolazine-I NaL block did not increase at concentrations beyond the highest tested (23 µM) because the well-constrained doseresponse curve allowed for extrapolation beyond experimentally tested concentrations.
For other MCMC simulations such as, dofetilide-I NaL , an inverse relationship of possible IC 50 -and h-values was observed, without a defined peak ( Figure 3C). Furthermore, many MCMC samples reached near the bounds imposed on IC 50 and h during fitting [95% CIs for pIC 50 (−0.754, 7.8227] and h [0.1543, 9.49)]. This was symptomatic of having insufficient experimental data to constrain IC 50 -values, as the maximum measured I NaL block was 12.1% at 3× C max , the highest concentration tested (Figure 3D, circles). Although an optimal fit could be defined using least squares (solid line), confidence in the fitted parameters was low, and uncertainty in predicted block increased abruptly above 3× C max . At 10× and 25× C max , the 95% CIs of predicted block were [0, 82.8%] and [0, 99.8%], respectively, reaching close to the maximum possible range (shaded area). Thus, under circumstances where insufficient current block was achieved in experiments, uncertainty in the dose-response relationship became very high when extrapolating beyond the tested concentrations. Similar findings were obtained with other drug-current combinations ( Table 3 and Figures S12-S62).
The amount of uncertainty in predicted block (measured as the width of the 95% CI) was examined as a function of the mean block achieved at the highest tested concentration (C high ). Table 4 lists the mean block measured in experiments at 1× C high for the 12 CiPA training drugs (some drugs had a different C high for different channels). The resulting uncertainty in the amount of drug block at concentrations above C high is depicted in Figure 4. At 1× C high , uncertainty was <25% for all drug-current combinations, indicating that variability in the experimental observations was low. When uncertainty was quantified at extrapolated concentrations (2×, 3×, and 10× C high ), differences were observed between experiments with low and high amounts of block at 1× C high . When <30% mean block was measured at 1× C high , uncertainty was >25% for most dose-response curves and reached close to 100% in several cases. But when >60% mean block was measured at 1× C high , uncertainty at the extrapolated concentrations was <16%. Thus, UQ results for this dataset suggest that >60% block should be achieved experimentally if dose-response curves are to predict drug effects beyond the tested concentrations. Although >60% block was achieved in hERG experiments with the 12 CiPA training drugs, none of the training drugs were tested at concentrations producing >60% block for all six non-hERG ionic currents (which would be unlikely other than for quinidine, given the selectivity of these compounds). This analysis therefore suggested that drug effects could only be reliably predicted at the highest experimentally tested concentration for which data on all six non-hERG ionic currents were available ( Table 4).

Propagation of Uncertainty to AP Simulations
Uncertainty in drug-hERG kinetics and dose-response curves was propagated to AP simulations to explore its impact on TdP risk stratification for the 12 CiPA training drugs. For each drug, the optimal drug-hERG parameters and Hill equation parameters (referred to as fixed inputs) were used to simulate APs, as in previous studies Li et al., 2017). In addition, a total of 2,000 drug-effect uncertainty samples per drug (referred to as uncertainty inputs) were simulated in order to estimate the distribution of drug effects derived from uncertainty characterization (see section Uncertainty in Drug-hERG Binding Kinetics-Uncertainty in Dose-Response Curves). Individual beats were classified as having normal APs, EADs, or depolarization failure (Figure 5A), and each simulation was classified as having EADs, complete depolarization failure, or normal otherwise (see section Simulation Protocol for TdP Risk Evaluation). As drug concentration increased from 1 to 25× C max in uncertainty-input simulations, repolarization and depolarization abnormalities became more frequent for some training drugs. EADs occurred in quinidine, dofetilide, and ranolazine simulations (Figure 5B), and depolarization failure occurred in quinidine, dofetilide, ranolazine, and verapamil simulations ( Figure 5C). However, the frequency of these events was generally low except in quinidine simulations, which had EADs in >90% of simulations at 3-10× C max and depolarization failure in >50% of simulations at ≥20× C max . While EADs are mechanistically linked to TdP, depolarization failure constitutes a different type of rhythm disturbance; therefore, simulations with depolarization failure were removed from further analysis. The remaining simulations represented the conditional distribution of drug effects, given that depolarization failure did not occur at a particular concentration.

Impact of Uncertainty on TdP Risk Stratification
Although EADs are a mechanistic marker for TdP risk, stratification based on EADs was not possible because they occurred very rarely in simulations, and not at all for many High Risk compounds at free C max . Instead, Dutta et al. (2017) proposed to use the in silico metric qNet (the net charge carried by major AP currents during one paced beat at steady state) as

Concentrations are also expressed as multiples of the maximum therapeutic concentration (× Cmax). Because some ionic current experiments used different test concentrations, verapamil and ranolazine both have two entries in the table.
FIGURE 4 | Uncertainty in dose-response curves at extrapolated drug concentrations. Current block experiments were performed for six ionic currents (see legend) with the 12 CiPA training drugs (72 drug-current combinations total with 19 excluded, see Table 3). Dose-response curves were fitted for each experiment and extrapolated above the highest experimentally tested drug concentration (C high ). Uncertainty in dose-response curves was quantified at 1×, 2×, 3×, and 10× C high as the width of the 95% CI for the predicted percentage block, plotted as a function of the mean experimentally observed block at 1× C high . Vertical dotted line is drawn at 60% observed mean block, denoting an approximate lower limit on the mean block that was observed at 1× C high in experiments for which uncertainty remained low (<16%) at higher concentrations.
an indicator of how far a cell is at a particular drug concentration from producing an EAD. The qNet metric was used in the present study as a marker of TdP risk because it successfully stratified the 12 CiPA training drugs at a range of concentrations in the previous study by Dutta et al. (2017). The calculation of qNet was updated to include simulations in which EADs occurred (see section Simulation Protocol for TdP Risk Evaluation) so that the sampling distributions of qNet would accurately reflect the uncertainty in drug parameters (excluding those that produced depolarization failure). As expected, the values of qNet obtained with uncertainty-input simulations trended according to TdP risk (Figures 6A,B). At a given concentration, median qNetvalues decreased between the Low, Intermediate, and High TdPrisk drugs, indicating that outward currents were diminished and inward currents became increasingly dominant at higher risk levels. Note also that extreme negative values of qNet occurred when EADs were present (Figure 6B), reflecting the higher TdP risk evident in these simulations. Variability in qNet increased as uncertainty in drug effects increased. At 1× C max , the distribution of qNet-values for each drug was relatively narrow, and as a result, only a small amount of overlap was observed between adjacent TdP risk levels ( Figure 6A). At 10× C max , however, the distribution of qNet-values for dofetilide (a High-risk drug) contained several outliers, which encompassed the values for all other drugs except the most negative quinidine values ( Figure 6B). These outliers resulted from the high degree of uncertainty in dose-response curves for dofetilide above the highest concentration tested (3× C max ), particularly with inward currents. As discussed in section Uncertainty in Dose-Response Curves, uncertainty in I NaL block by dofetilide increased dramatically above 3× C max (Figure 3D, shaded area). A similar pattern occurred for I CaL block by dofetilide ( Figure S12), with high uncertainty in predicted block at 10× C max [95% CI (0%, 97.6%)]. Because qNet reflects the FIGURE 6 | Uncertainty in qNet for the 12 CiPA training drugs. Violin plots are shown for qNet distributions at 1× (A) and 10× (B) C max , based on uncertainty-input simulations. Dotted line indicates the control (no drug) value of qNet. (C) qNet at 1−10× C max (1× increments) and 15−25× C max (5× increments). Shaded areas indicate the 95% CIs of qNet obtained from uncertainty-input simulations. Points indicate the highest simulated concentration for which complete experimental data on six non-hERG currents were available. Fixed-input results are shown below (solid lines) or above (dotted lines) this concentration. Likewise, uncertainty-input results are indicated below (dark shaded areas) or above (light shaded areas) this concentration. Simulations with depolarization failure ( Figure 5B) were excluded from the results. For all panels, High TdP-risk drugs are in red, Intermediate-risk drugs are in blue, and Low-risk drugs are in green.
balance of inward currents (I NaL and I CaL ) and outward currents (mainly I Kr ), the effects of I Kr block by dofetilide were offset in simulations with significant block of I NaL or I CaL , resulting in the "safe" outliers for dofetilide at 10× C max with very high qNetvalues. On the other hand, simulations with very little I NaL or I CaL block led to "dangerous" outliers with very low or negative qNet-values.
Poor separation of qNet between TdP risk levels was apparent at higher drug concentrations, due primarily to the increased uncertainty in drug effects. Dutta et al. (2017) showed that with fixed model simulations, perfect separation in qNet occurred for the 12 CiPA training drugs at 1-25× C max . However, our analysis of dose-response uncertainty in section Uncertainty in Dose-Response Curves suggests that qNet may be highly variable above experimentally tested concentrations. In Figure 6C, fixed-input simulation results are shown for concentrations up to (solid lines) and including (point) the maximum simulated concentrations for which complete drug block data on all six non-hERG ionic currents was available; above these concentrations, fixed-input results are plotted as dotted lines. At 1× C max , data were available for all 12 CiPA training drugs. Above 1× C max , however, some data were unavailable for quinidine (>1.7× C max ), mexiletine (>2.4× C max ), dofetilide (>3× C max ), verapamil (>6.2× C max ), and ranolazine (>11.8× C max ; see Table 4). Nevertheless, near 1× C max , the 95% CIs of qNet remained largely separated between TdP risk levels, indicating that uncertainty at these concentrations was low enough to stratify the training drugs (shaded areas). At >4× C max , however, overlap between different risk levels increased due to the higher variability in qNet sampling distributions, particularly for verapamil and dofetilide. However, increased uncertainty in qNet was not the sole factor affecting TdP risk separation. The qNet-values for verapamil and ranolazine (Low-risk drugs) also drifted closer to those of chlorpromazine (Intermediate-risk) at >4× C max , further increasing the overlap between these risk levels, though qNetvalues for fixed-input results remained separate.
The accuracy of TdP risk stratification as a function of concentration was assessed using LOOCV. At each concentration relative to C max , a classifier was trained on qNet uncertainty samples for 11 of the 12 training drugs and then used to predict the probabilities of each TdP risk level for the remaining drug (see section Cross Validation). At 1× C max , the maximum probability always occurred at the correct TdP risk level, but several drugs had non-zero probabilities for the incorrect TdP risk level (Table 5). In contrast, when LOOCV was performed at 1× C max in Dutta et al. (2017), two drugs (terfenadine and chlorpromazine) were misclassified on the basis of fixedinput results (equivalent to a predicted 100% probability of the drug being in the wrong category). As a result, although LOOCV prediction errors were non-zero for more drugs when uncertainty was considered, the overall mean prediction error was lower as compared to fixed-input results (0.09 vs. 0.17). At 10× C max , however, mean prediction error was higher when the classifier was trained on uncertainty-input results rather than fixed-input results (0.23 vs. 0.08) because of increased prediction errors for dofetilide, sotalol, cisapride, and verapamil. This was due to the low level of block achieved experimentally for many non-hERG currents, which led to high uncertainty in qNet when drug effects were extrapolated above the tested concentrations. Thus, uncertainty analysis produced more robust TdP risk predictions near concentrations with experimental data for all currents but less robust predictions at concentrations for which extrapolation of drug effects was unreliable due to insufficient levels of block (<60%) measured experimentally.
LOOCV results for the 12 training drugs at 1-25× C max are summarized in Figure 7A. As concentration increased, prediction errors improved for some drugs and worsened for others. Terfenadine's prediction error was the highest of all drugs at 1× C max (0.4545) but decreased to <0.01 at 4× C max (blue diamonds). On the other hand, prediction errors for chlorpromazine (blue circles), sotalol (red triangles), verapamil (green triangles), cisapride (blue × s), and dofetilide (red squares) all generally increased from 1 to 10× C max . Above 10× C max , prediction errors for dofetilide and ranolazine (green crosses) increased, while prediction errors for sotalol decreased. As a result of these trends, both the mean and the standard deviation of prediction errors were lowest at 1-4× C max (Figure 7A, black points and error bars), near the concentrations for which experimental data on all currents were available for the 12 training drugs.  The TdP risk levels were assigned category values of 2 (High), 1 (Intermediate), and 0 (Low). A classifier was trained on 11 of 12 drugs and then used to predict the category probabilities [P(x), where x is the category value] and to obtain an overall prediction error for the remaining drug (see section Cross Validation). Uncertainty model simulations were used for training and prediction. For comparison, probabilities, and prediction errors from Dutta et al. (2017) are shown in parentheses when they differed from uncertainty results. To explore the impact of different ionic currents on TdP risk stratification, LOOCV was repeated for a set of simulations in which drug effects on a particular ion channel were removed. This analysis was limited to 1-4× C max in order to avoid concentrations at which uncertainty was due primarily to the lack of experimental data. When drug effects on I Na , I to , I Ks , or I K1 were removed, prediction errors were virtually unchanged ( Figure 7B). However, when drug effects on I CaL , I NaL , or I Kr were removed, prediction errors increased dramatically, indicating that TdP risk stratification of the 12 CiPA training compounds depended primarily on the drug effects for these three currents. Because most of the training compounds (other than quinidine) did not block I Na , I to , I Ks , or I K1 substantially at 1−4× C max , their resulting impact on TdP risk stratification was expected to be minimal.

DISCUSSION
Although many potential sources of uncertainty exist within the CiPA paradigm, the primary concern for the in silico component is uncertainty related to in vitro measurements of pharmacological effects on ionic currents. This study presents methods for conducting UQ within the framework of the CiPA in silico assay. Previously, Dutta et al. (2017) showed that the metric qNet, derived from fixed-input AP simulations incorporating multiple ion channel pharmacology, could be used to stratify the CiPA training set of 12 compounds by relative TdP risk. This study examined the impact of uncertainty in drug effects on simulation predictions. Bootstrapping and Bayesian inference were used to estimate the joint probability distributions of drug parameters in order to quantify the variability in mean drug effects. This variability was then propagated to a set of uncertainty-input AP simulations to assess the robustness of risk stratification with qNet. UQ revealed that some drug effects were insufficiently constrained at higher concentrations to be able to stratify TdP risk with high confidence. Near therapeutic concentrations, however, TdP risk stratification was robust to the uncertainty in drug effects. This study illustrates the benefits of applying UQ under the CiPA paradigm, both during model validation and when model-based predictions are used in regulatory decision making.
UQ helped to identify challenges concerning model calibration and parameter identification that will inform future model development. Such issues are frequently encountered in models of cardiac electrophysiology but are not often addressed during model development (Fink and Noble, 2009;Shotwell and Gray, 2016). In the Li et al. (2017) I Kr Markov model, drug-hERG binding kinetics was characterized by six parameters, but one parameter (drug trapping rate, K t ) was fixed at a value of 3.5× 10 −5 ms −1 . UQ revealed that three of the remaining five parameters (K max , EC 50 n , and V halftrap ) could not be precisely estimated based on the available data. Although the current model structure was designed to allow for both linear and sigmoidal drug binding as well as drug trapping, this flexibility comes at the expense of parameter identifiability and presents difficulties for UQ. To address these issues, model recalibration and/or simplification may be warranted, as was done for a model of I Na inactivation in Pathmanathan et al. (2015).
On the other hand, for some drugs, the observed hERG block kinetics could not be accurately captured by the I Kr Markov model. For instance, at 10 nM cisapride, hERG block developed more slowly in the experimental traces than in fitted model, even when uncertainty was considered ( Figure S4B). This suggests that alternative (and possibly more complex) model structures might be needed to characterize certain drugs. Thus, the challenge for CiPA is to define a one-size-fits-all model that is simple enough to be estimable but still accurate enough to predict TdP risk. The current approach attempts to strike an appropriate balance between the two concerns, combining the flexible dynamic representation of I Kr block with a simplified pore-block approach for other currents. The final assessment of the model will depend on its validation with an additional 16 compounds, which will determine its suitability for CiPA .
Many IC 50 -values could not be reliably estimated from the current data, an issue raised previously by Johnstone et al. (2016a). This occurred when fitted IC 50 -values were well above the tested concentrations, resulting in high levels of uncertainty in the upper concentration ranges simulated by Li et al. (2017) and Dutta et al. (2017). The impact of this uncertainty is illustrated in results for the High-risk drug dofetilide, which is known to be a selective hERG blocker. Because its hERG selectivity could not be confirmed above 3× C max with the current dataset (see Figure 3D and Figures S12-S15), uncertainty-input simulations of dofetilide above 10× C max resulted in highly variable qNet-values, including very "safe" values similar to Low-risk drugs ( Figure 6B). Although the impact of dofetilide on non-hERG currents is likely small, such assumptions cannot be made for new compounds, particularly if such currents and higher concentration ranges are deemed relevant for TdP risk prediction. To avoid these assumptions, in silico model predictions should be limited to concentrations less than or equal to the highest tested experimentally, unless the amount of drug block can be reliably extrapolated from data at lower concentrations (generally, if >60% block is achieved experimentally, see Figure 4). Thus, UQ highlights the importance of obtaining the appropriate data for generating reliable model predictions within the CiPA paradigm. For the current training set, TdP risk prediction appeared to depend solely on I CaL , I NaL , and I Kr data (Figure 7B), so this "60% rule" may potentially only need apply to these three currents. However, the importance of I Na , I to , I Ks , and I K1 cannot be discounted entirely because most of the training compounds did not substantially affect these currents. Further sensitivity analysis of qNet and testing with additional compounds may provide insight into the importance of these currents for TdP risk prediction.
Hierarchical UQ approaches may account for some of the discrepancies between observed experimental variability and the estimated variability of model outputs in the present study. For example, at the highest bepridil concentration (300 nM), the kinetics of I Kr block in a few cells was noticeably faster than that of other cells and the fitted bootstrap traces. Although it is unlikely that any single method could capture all observed variability, hierarchical approaches to quantify inter-individual variability may provide a more accurate representation of the true physiological variability than do population-averaged approaches (Pathmanathan et al., 2015). Recently, Johnstone et al. (2016a) used a hierarchical statistical model to assess the inter-experiment variability of drug block data from Crumb et al. (2016). Such an approach could be explored in the future if complete doseresponse data for all ionic currents become available. In the present study, however, the IC 50 of most currents could not be reliably estimated, so a further hierarchical analysis was not warranted. For the Li et al. (2017) I Kr Markov model, hierarchical methods would be more experimentally and computationally challenging. Experimentally, this would require obtaining hERG block data for each cell at multiple concentrations in order to estimate individual dose-dependent kinetics. However, due to stability and time limitations associated with the current experimental protocol, cells were only recorded at a single concentration. The computational demands of estimating hierarchical model parameters for dynamic models would also be very high because of the need to integrate differential equations. Addressing these difficulties may be unnecessary for CiPA, however, if a population-averaged approach to UQ is shown to provide sufficient information for robust TdP risk prediction.
The UQ results presented in this study illustrate the need to evaluate model predictions in the context of uncertainty. Previously, Dutta et al. (2017) demonstrated that qNet could separate the CiPA training drugs by TdP risk better than metrics based on AP or Ca 2+ transient morphology. In addition, the mean LOOCV prediction error of qNet was lower when drugs were simulated at 10× and 20× C max than at 1× C max , suggesting that higher concentrations could provide better risk separation. However, this assessment was based only on fixedinput simulations. When uncertainty inputs were used to classify drugs, mean LOOCV prediction error was lowest at 1-4× C max and worsened as concentration increased above 4× C max ( Figure 7A). In part, the differences in LOOCV results for fixed vs. uncertainty inputs were due to the high uncertainty in qNet for drugs such as, dofetilide and verapamil above 4× C max ( Figure 6C). However, these differences also arose because when uncertainty was low, classification with qNet probability distributions was more robust than with fixed qNet-values, which improved the mean LOOCV prediction error at 1× C max ( Table 5). UQ also provided an indication of the degree to which drugs could be separated, so LOOCV was more sensitive to subtle changes in qNet. Risk stratification of the training drugs at >4× C max may be improved if additional in vitro data are obtained at higher concentrations and incorporated into the model. However, it is important to keep in mind that the CiPAassigned TdP risk levels for the 12 training and 16 validation compounds are not absolute; these relative risks are mainly based on years of clinical evidence and expert opinion rather than a quantitative measure of real-world data. Effort is ongoing within the CiPA framework to develop more objective and quantitative TdP risk categorization systems based on postmarket data, which will help to refine the model and metric for more accurate TdP risk assessment.
This study did not address the issue of model uncertainty related to physiological variability because the focus of CiPA is on drug screening and obtaining an estimate of proarrhythmic risk that can be used to assess overall drug safety, not on predicting risk in specific individuals or subpopulations. However, this is an important topic for many safety pharmacology applications involving mathematical modeling. In pharmacokinetics, nonlinear mixed effects (NLME) models have routinely been applied to quantify intersubject variability (Fitzmaurice et al., 2008). However, methods for quantifying physiological variability in more complex cardiac electrophysiology models are not wellestablished. One approach has been to use a "population" of in silico cardiac cell models, generated by randomly varying model parameters, to explore mechanisms underlying physiological variability and to predict the resulting variability in drug responses, such as, hERG block-induced changes in APD (Sarkar and Sobie, 2011;Britton et al., 2013). The aim of UQ is to estimate model parameters within a statistical framework and then to give probabilistic predictions. Pathmanathan et al. (2015) used data from 10 to 16 cells and NLME modeling to perform a thorough UQ analysis of a single model parameter, steady-state I Na inactivation. But applying similar approaches to whole cell models, which typically have dozens of parameters, would require large amounts of data and, most likely, simpler models, as discussed by Pathmanathan et al. (2015). Nevertheless, such studies on physiological variability can be considered in complement with the results in this study concerning UQ of drug effects, providing insight into how multiple sources of uncertainty may impact variability in drug responses.
One additional issue that was not explored in this study was the effect of the number of experimental repeats on parameter uncertainty. For the manual patch clamp data used in this study, 4-10 repeats were obtained per drug concentration for the hERG experiments, and 3-4 repeats were obtained for non-hERG experiments. Thus, based on the current dataset, 3-4 experimental repeats appeared sufficient to constrain the model parameters for TdP risk prediction. However, data obtained from multiple labs or using automated, high-throughput systems can be much more variable, and more experimental repeats may be needed to accurately estimate the mean drug effect with these types of data (Elkins et al., 2013). These issues may be addressed in the future CiPA in silico validation phase.
In summary, risk stratification of the CiPA training drugs with the currently available data was most reliable near the maximum clinical concentration. This was because most of the in vitro experiments were designed around known therapeutic concentrations that often did not block the major ionic currents, and measurements at significantly higher concentrations were not consistently obtained for all drugs. The lack of experimental data produced a large degree of uncertainty in drug effects, which negatively impacted the ability to distinguish between drugs of different TdP risk at higher concentrations. Hence, our findings suggest that for new compounds, the CiPA in silico assay will require in vitro measurements at much higher drug concentrations that can achieve significant ionic current block if the model is expected to provide TdP risk predictions with high confidence. Whether this will be necessary for all seven ion channels that have been suggested as part of CiPA, however, remains to be determined.

AUTHOR CONTRIBUTIONS
KC designed and carried out the study and wrote the manuscript. SD, GM, KB, and ZL contributed to the study design and analysis. ZL supervised the project. ZL, DS, SD, GM, KB, and TC revised the manuscript. JS, PT, MW, and WW collected the data and provided guidance on interpretation of the data.

FUNDING
This project was supported by an appointment to the Research Participation Program at CDER, administered by the Oak Ridge Institute for Science and Education (ORISE) through an interagency agreement between the US Department of Energy and the FDA. The perspectives presented in this work are those of the authors and do not represent the views of the FDA or its employees. GM gratefully acknowledges personal support from a Sir Henry Dale Fellowship jointly funded by the Wellcome Trust and The Royal Society (Grant Number 101222/Z/13/Z).