Evaluation of cardiac pro-arrhythmic risks using the artificial neural network with ToR–ORd in silico model output

Torsades de pointes (TdP) is a type of ventricular arrhythmia that can lead to sudden cardiac death. Drug-induced TdP has been an important concern for researchers and international regulatory boards. The Comprehensive in vitro Proarrhythmia Assay (CiPA) initiative was proposed that integrates in vitro testing and computational models of cardiac ion channels and human cardiomyocyte cells to evaluate the proarrhythmic risk of drugs. The TdP risk classification performance using only a single TdP metric may require some improvements because of information limitations and the instability of generalizing results. This study evaluates the performance of TdP metrics from the in silico simulations of the Tomek–O'Hara Rudy (ToR–ORd) ventricular cell model for classifying the TdP risk of drugs. We utilized these metrics as an input to an artificial neural network (ANN)-based classifier. The ANN model was optimized through hyperparameter tuning using the grid search (GS) method to find the optimal model. The study outcomes show an area under the curve (AUC) value of 0.979 for the high-risk category, 0.791 for the intermediate-risk category, and 0.937 for the low-risk category. Therefore, this study successfully demonstrates the capability of the ToR–ORd ventricular cell model in classifying the TdP risk into three risk categories, providing new insights into TdP risk prediction methods.


Introduction
The development of pharmaceuticals is crucial due to the potential risk of Torsades de Pointes (TdP).For that reason, in the 1990s-2000s, several drugs were withdrawn from the market due to their tendency to induce TdP (Wood and Roden, 2004;Sager et al., 2014).The potential risk associated with TdP occurs because of the pharmacological inhibition of human ether-a-go-go-related gene (hERG) ion channels, which regulate the rapid component of the delayed rectifier current (IKr) and prolong the QT interval during drug administration, and it may potentially be life-threatening to consumers.Responding to these concerns, the International Council for Harmonization (ICH) established the E14 guidelines for the clinical evaluation and the S7B guidelines for the non-clinical assessment of drug cardiotoxicity and ventricular repolarization (Cavero and Crumb, 2005;Crumb et al., 2016).
Nevertheless, these guidelines have limitations, including the need for detailed clinical trials and having a high level of sensitivity with low specificity, which may obstruct the development of therapeutic drugs capable of prolonging the QT interval without inducing TdP (Llopis-Lorente et al., 2020).The Comprehensive in vitro Proarrhythmia Assay (CiPA) initiative was proposed by the Food and Drug Administration (FDA) during a Think Tank meeting in the United States in July 2013.It involves four working groups focusing on in vitro ion channel studies, in silico modeling of ion channel study results, cardiomyocyte studies, and searching for biomarkers other than QTc in human electrocardiography (ECG) (Crumb et al., 2016).
The development of the O'Hara Rudy (ORd) ventricular cell model was observed in disease-free human ventricular cells from a wealth of experimental data obtained from healthy individuals (O' Hara et al., 2011).A previous study introduced and applied the dynamic hERG pharmacological model to the ORd model (IKrdyn ORd model) (Li et al., 2017).Furthermore, a follow-up study improved the ORd model with the dynamic hERG by rescaling some major ionic currents, including IKs, L-type calcium current (ICaL), IKr, sodium current (INa), and inward rectifier current (IK1) (CiPAORdv1.0 model) (Dutta et al., 2017).The ventricular cell model enables the simulation of cellular responses to a range of stimuli or conditions that potentially induce TdP, offering a deeper insight into how alterations at the cellular level can impact the risk of arrhythmias like TdP.This modeling approach allows for the generation of various in silico biomarkers from action potential (AP), calcium concentration (Ca), and net charges (qNet and qInward).
Several studies have applied various TdP metrics from the ORd or CiPAORdv1.0ventricular cell model to classify TdP risks (Mirams et al., 2011;Lancaster and Sobie, 2016;Passini et al., 2017;Li et al., 2019;Yoo et al., 2021;Jeong et al., 2022).In their study, Mirams et al. (2011) utilized the ventricular cell model developed by Grandi et al. (2010).APD 90 was the reference parameter for assessing APD 50 , APD 90 , triangulation, and maximum restitution.In the machine learning process, APD 50 , APD 90 , triangulation of the action potential duration (APD), and maximum restitution of APD were used.Nevertheless, the integration of these parameters resulted in marginally enhanced leave-one-out cross-validation scores that may be attributable to the potential occurrence of overfitting in the model.
The FDA proposed using qNet and qInward, which were calculated through in silico simulations that show promising metrics for categorizing the proarrhythmic risk of drugs (Strauss et al., 2019).A previous study suggested qNet as a single metric from the CiPAORdv1.0ventricular cell model to classify TdP risk into high-and low-risk categories using ordinal logistic regression (Li et al., 2019).In addition, a study proposed the utilization of 13 electrophysiological features (upstroke velocity, peak voltage, APD 50 , APD at −60 mV, APD 90 , resting voltage, AP triangulation, diastolic [Ca2 + ]i, amplitude of Ca transient , peak [Ca2 + ]i, CaD 50 , CaD 90 , and Catri) as appropriate parameters in implementing statistical and machine learning models to perform the binary classification of various drugs into two categories: TdP+ (posing a risk of TdP) or TdP− (lacking TdP risk) (Parikh et al., 2017).
A dataset comprising 86 drugs from the studies conducted by Mirams et al. (2011) and Kramer et al. (2013) has been used as a computational methodology that integrates simulations of drug effects on cardiac dynamics with statistical analyses and machine learning techniques to classify TdP risks.The combination of drug simulation data and statistical analysis employing the support vector machine (SVM) algorithm shows good performance, where the AUC value is 0.86.This approach relies on metrics computed from the action potential duration at 90% repolarization (APD 90 ) and the Ca resting waveform.This approach delivers superior risk prediction capabilities by incorporating insights derived from cardiac cells (Lancaster and Sobie, 2016).
Furthermore, the Tomek-O'Hara Rudy (ToR-ORd) ventricular cell model proposed by Tomek et al. (2019) represents an improvement over the ORd ventricular cell model.The ORd cell model shows that the AP plateau exhibits a higher value than the experimental data employed.The dynamics of AP duration accommodation to heart rate acceleration or sodium behavior demonstrates limited agreement (Tomek et al., 2019).Therefore, the ToR-ORd ventricular cell model revised equations for the ICaL, calciumsensitive chloride current I(Ca)Cl, chloride background current (IClb), INa, IK1, and IKr to replicate AP waveforms (Tomek et al., 2019).The ICaL revision was based on the Goldman-Hodgkin-Katz (GHK) flux equation with the ionic activity coefficient derived from the Davies equation and Debye-Huckel theory, according to the studies by Magyar et al. (2000), and INa was substituted with an alternative formulation derived by Grandi et al. (2010), providing a more precise representation of sodium current behavior in human ventricular myocardial cells (Grandi et al., 2010;Tomek et al., 2019).Moreover, the ToR-ORd ventricular cell model modified INa to incorporate changes resulting from CaMKII phosphorylation, a critical regulatory mechanism in cardiac electrophysiology (Tomek et al., 2019).The ToR-ORd model was developed through a comprehensive calibration and validation strategy, encompassing rigorous calibration criteria (Han et al., 2020).Concurrently, the authors found that calcium concentrations and sodium homeostasis remain constant under conditions simulating high potential, which is consistent with available experimental data.
A recent study validated 12 in silico features, i.e., 6 AP features (APD 50 , dVm dt max , dVm dt max repol Vm peak , APD 90 , APD 50 , and APD tri ), 4 calcium features (Ca peak ; CaD 90 ; CaD 50 ; and CaD tri ), and 2 net charge features (qInward and qNet), derived from the ToR-ORd ventricular cell model affected by drugs (Jeong et al., 2022).The study classified TdP risk (high, intermediate, and low) using ordinal logistic regression (OLR) for each TdP metric.However, using a single TdP metric demonstrated low performance.This is attributed to the complexity of TdP phenomena involving various aspects of cellular electrophysiology, rendering a single feature insufficient to encompass all relevant variations and interactions in TdP risks.
In this study, we assess the diversity of in silico metrics for TdP, encompassing various aspects of cellular electrophysiological changes induced by drug influence.Consequently, the primary objective of this research is to leverage the nine available features from the ToR-ORd ventricular cell model to classify TdP risk.This is achieved by using an optimized artificial neural network (ANN) through the grid search (GS) process for hyperparameter tuning and integrating explainable artificial intelligence (XAI) with SHapley Additive exPlanation (SHAP) values into the analysis.

Research methods
Figure 1 illustrates the comprehensive methodology, which includes several stages.The first step involved generating drug samples using the Markov chain Monte Carlo (MCMC) method.Then, biomarkers were produced from in silico simulation.The biomarkers acted as features for the ANN to predict the TdP risk of drugs.Finally, the ANN model was optimized by GS hyperparameter optimization and feature selection using XAI.

In vitro experimental dataset and MCMC simulation
This study evaluated the drug toxicity risk using 28 drugs from the in vitro dataset, as shown in Table 1 (Li et al., 2019).This dataset applied the inhibition levels of four ion channels (INaL, INa, ICaL, and IKr) for training and testing datasets.For the dose-response analysis, the data that are uploaded by the CiPA group are accessible at https://github.com/FDA/CiPA/tree/Model-Validation-2018/Hill_Fitting/data.We also applied the same Hill fitting method used by Li et al. (2019) to ensure the consistency and validity of our analysis.This technique produced 2,000 Hill curves illustrating the relationship between ion channel blockade and drug concentration described by drug concentrations leading to the half-maximal inhibitory concentration (IC 50 ) values of the ionic channels and the slope of the Hill curve (h or Hill coefficient).Finally, the 2,000 IC 50 values and h samples for each drug were used as inputs for in silico simulations.Proposed torsades de pointes (TdP) risk evaluation method.This study utilized 2,000 Hill samples from in vitro patch-clamp data provided by Li et al. (2019).These samples were subsequently integrated into the ToR-ORd in silico ventricular cell model, given 28 different drugs at 4 maximum concentration levels (C max 1-4), resulting in 2,000 samples for 9 key TdP features.The average sample values were calculated for each sample based on the results obtained at the four distinct drug concentration levels.Then, the dataset was segmented into 2 parts: a training dataset consisting of 24,000 samples (12 drugs × 2,000 samples) used in the artificial neural network (ANN) classifier model and a testing dataset consisting of 32,000 samples (16 drugs × 2,000 samples) for evaluation through 10,000 testing iterations.

Preprocessing and the ToR-ORd in silico ventricular cell model
This study used the in silico ToR-ORd ventricular cell model to simulate cardiomyocyte electrophysiology.The in silico simulator was developed using the C++ programming language as the underlying code supported by libraries such as CVode to solve differential equations.The in silico simulator code is given in Supplementary Material S1.The effects of drugs on myocardial cells were quantified using Eq. 1, which includes the IC 50 parameter that represents the concentration at which the half-maximal effect is caused, the Hill coefficient determining the dose-response curve (H), and drug concentration (D).We used Eq. 1 to calculate the inhibitory factor, which indicates how the drug impacts the in silico ToR-ORd ventricular cell model.Eq. 2 describes the maximum conductance g of ion channel i under the drug effect.Note that g control,i represents the maximum conductance of the ion channel without the drug effect.

Inhibition factor 1
1 In this study, we implemented four different concentrations for each type of drug for in silico simulations.Each drug concentration produced a total of 2,000 samples for each TdP metric, and we utilized a cumulative total of 8,000 samples (2,000 samples for each of the 4 drug concentrations) as the input.One beat with the highest dVm dtmax repol among the last 250 beats was selected for generating TdP metrics (Chang et al., 2017).Through these simulations, the electrophysiological response of human ventricular myocytes was influenced by the drug effects with 1,000 pacing cycles with a duration of 2,000 milliseconds (ms) for each pacing.
We utilize a set of metrics to analyze the impact of drugs on human ventricular cardiac cells.These metrics encompass the concept of net charge, with qNet reflecting the total charge alteration during the simulation, along with qInward, which measures the inward charge during the simulation, as expressed by Eqs 3, 4: Furthermore, we consider calcium-related metrics, such as CaD 90 , indicating when calcium reaches 90% repolarization during the cycle.CaD 50 measures the time when calcium reaches 50% repolarization during the cycle, and Ca resting is used for assessing the calcium level during the relaxation phase.We also consider metrics related to APs, including dVm dt max , which represents the maximum rate of change in the membrane potential during the depolarization phase of the AP cycle.APD 90 indicates the duration of the AP at 90% repolarization, while APD 50 measures the duration of the AP at 50% repolarization.Lastly, the Vm resting metric measures the AP duration during the resting phase.
Using drug samples generated using the MCMC process, we acquired 56,000 samples (2,000 samples × 28 types of drugs).From the 28 drug types released by the CiPA group, as given in Table 1, we selected 12 drugs as the training dataset.Therefore, we utilized 24,000 samples (2,000 samples × 12 drugs).Each sample consists of nine TdP metrics, and it will be used as the input data in the ANN simulation that will be optimized by GS.We implemented all machine learning and model evaluation codes within a Jupyter Notebook using Python programming.The simulation code of our proposed model is given in Supplementary Material S1.

The proposed ANN optimization by grid search
This study divided 56,000 samples generated from in silico simulations into 2 sets to develop and evaluate an ANN model.The first set, comprising 24,000 samples with 2,000 samples for 12 different drugs each, was designated as the training dataset.To ensure the accuracy and robustness of the ANN model, we employed a 10-fold cross-validation method during the training process.In each cross-validation iteration, 21,600 samples were used for training and 2,400 samples for validation, which enabled a comprehensive assessment of the model performance across different subsets of training data.The remaining 32,000 samples, corresponding to 2,000 samples for 16 other drugs each, formed the testing dataset.This ensured a rigorous evaluation of the ANN's predictive capabilities on unseen data.Accordingly, the model is both generalizable and effective in simulating drug effects based on the diverse profiles of the drugs.
We implemented the GS hyperparameter optimization approach in the ANN classifier.This approach aims to identify the optimal parameter configuration capable of achieving optimal performance for a machine learning architecture.A GS systematically tests all possible combinations of the provided parameters.It evaluates the model performance for each hyperparameter combination, and this process requires training and evaluating the model using various parameter combinations.The results of these evaluations then serve as the basis for selecting the parameter combination that yields the best performance.
We propose various hyperparameters, including batch size (32 and 64), optimization (RMSprop and Adam), the number of neurons in the hidden layers (5, 6, and 7), learning rates (0.1, 0.001, and 0.01), and alpha values (0.1, 0.01, 0.001, 0.2, 0.02, and 0.002).Here, regularization techniques lasso (L1) and ridge (L2) (0.01) were utilized in the first hidden layer to control complexity and mitigate overfitting.In each iteration, GS combined each existing parameter value (e.g., the combination of a batch size of 32 with RMSprop optimization, 5 neurons, a learning rate of 0.1, and an alpha value of 0.001).The ANN model was updated and evaluated using multiple parameter combinations.This process was repeated for all possible combinations, resulting in parameter combinations that provide the best performance on the dataset.The results from this ANN model were employed for classifying the TdP risk of drugs, which are high risk, intermediate risk, and lowrisk.The classification process utilized the output of the softmax function to generate risk probabilities.

Explainable AI using the ANN classifier
In XAI, SHAP values are important because they offer a consistent and theoretical way to measure how each feature contributes to prediction accuracy.Applying SHAP values to complex models such as an ANN is recommended since complex models cannot be explained easily using intrinsic explanations (Zhang et al., 2020;Thisovithan et al., 2023).A SHAP value analysis provides an insight into the contribution of each feature member to the collation value of a model.The results of SHAP values can be considered in light of the contribution made by each feature that has a positive or negative impact on the target output.The SHAP value for a specific data point is described in Eqs 5, 6, where X represents the feature value vector (which needs to be explained), S indicates the input feature subset, and the Shapley value can be obtained through the value function (5)

Model evaluation
In order to assess the effectiveness of a classification model, we implemented 10,000 test protocols as outlined in the CiPA initiative by Li et al. (2019), as shown in Figure 2. The dataset encompassed 32,000 samples, comprising 16 drug categories containing 2,000 samples.We generated 10,000 random sample subsets from this dataset covering all 16 drug categories.The model was evaluated by generating 10,000 receiver operating characteristic (ROC) curves for each risk category: high, intermediate, and low.The model performance was then statistically assessed by calculating the area under the ROC curve (AUC), positive and negative likelihood ratios, and the average classification error to measure the model accuracy.
The positive and negative likelihood ratios were determined using Eqs 6, 7. Model sensitivity and specificity are two essential components of these ratios.Sensitivity and specificity were calculated based on Eqs 8, 9, where a true positive (TP) defines the number of positive cases accurately identified by the model; true negative (TN) refers to accurately predicted negative cases; false positive (FP) indicates negative cases erroneously predicted as positive; and false negative (FN) represents positive cases mistakenly classified as negative.
Likelihood ratio positive LR+ ( ) Sensitivity 1 − specificity (7) Likelihood ratio negative LR− ( ) 1 − sensitivity specificity (8) Based on the ROC curve, the AUC indicates the ability of the model to distinguish between positive and negative classes by measuring the relationship between the true positive rate (TPR) and false positive rate (FPR) using Eqs 10, 11.The TPR measures how sensitive the model is to identifying positive cases, whereas the FPR measures how often the model generates false positive predictions in negative situations.The model performance was evaluated using median values and 95% confidence intervals, calculated from the 2.5th to 97.5th percentiles of the test outcomes.Meanwhile, the 95% confidence interval for the average error was determined using the formula mean ±1.96 * SD/√N, where SD is the standard deviation of the error and N indicates the total number of samples (16 drug tests multiplied by 2,000 samples for validation).

Results
This study classified TdP risk with various drugs inducing TdP metrics within the ToR-ORd ventricular in silico cell model.We implemented 10-fold cross-validation using the GS method to train 12 drugs using the ANN classifier.We employed the GS method for automatic hyperparameter tuning to select the optimal parameters for the ANN classifier.The detailed structure of the ANN model, based on the GS simulation results, is shown in Figure 3.According to the GS method, six neurons were used in the first layer and five neurons in the second layer, with an alpha value of 0.1 applied to the activation function for the leaky ReLU for both layers.The model was optimized using Adam as the optimizer and trained for 200 epochs with a learning rate of 0.001.The training process was conducted using a batch size of 32.The outcomes obtained from this ANN model classified the TdP drug risks into three risk categories: high, intermediate, and low.This classification was achieved through the output layer utilizing the softmax function to generate risk probabilities.The model performance was evaluated through a 10-fold cross-validation procedure to validate the robustness of the model.Subsequently, the best-performing model was tested using 16 test datasets and subjected to 10,000 iterations of testing.
Furthermore, we analyzed nine features ( dV dt max ; Vm resting ; APD 90 ; APD 50 ; Ca resting ; CaD 90 ; CaD 50 ; qNet; and qInward) from the training dataset to assess feature contributions to each class.By using the ANN classifier, we performed feature importance analysis  using the SHAP method (Zhang et al., 2020;Yunendah et al., 2023) to evaluate feature contributions to the ANN classifier.Figure 4 shows SHAP values obtained from the trained ANN model to classify TdP risk.SHAP values provide an interpretative method for understanding the influence of each feature on the predictions generated by the model.
In the context of TdP risk induced by various drugs, Figure 4 shows that certain features exhibit higher SHAP values, indicating a dominant role in influencing the model decisions.Conversely, features with lower SHAP values contribute minimally to the model decisions in classifying TdP risk.Based on the SHAP results given in Figure 4, qInward demonstrates the most dominant contribution to both high-and low-risk classes, followed by CaD 50 , CaD 90 , APD 50 , APD 90 , and qNet.Meanwhile, Ca resting , Vm resting , and dV dt max show minimal contributions to these classification.In order to analyze the performance of each feature, we conducted individual feature analyses within the ordinal logistic regression model.As shown in Table 2, qNet, APD 50 , qInward, and CaD 50 achieved high performances for high risk.
Furthermore, we examined the feature importance rankings provided by the SHAP values for the ANN classifier using nine features.Subsequently, this analytical process was expanded with additional experiments involving the reduction of features that did not show significant contributions based on the SHAP value analysis.The top feature groups, analyzed based on their contributions to the model, comprised five, six, seven, and eight top features.Among them, the group with five features consisted of qInward, qNet, CaD 50 , CaD 90 , and APD 50 .Then, the top six feature groups were qInward, qNet, CaD 50 , CaD 90 , APD 50 , and APD 90 .The group of the top seven features consisted of qInward, qNet, CaD 50 , CaD 90 , APD 50 , APD 90 , and Ca resting .In addition, the group of the top eight features encompassed qInward, qNet, CaD 50 , CaD 90 , APD 50 , APD 90 , Ca resting , and Vm resting .This analysis provided an insight into the importance of each feature in the context of TdP risk induced by various drugs.
Table 3 shows the evaluation of the performance of a predictive model in simulations.In the configuration with 5 features, the model demonstrated an effective ability with an AUC of 0.930 for high risk and 0.901 for low risk, indicating a good capacity in identifying lowrisk events.Meanwhile, the positive likelihood ratio (LR+) was 5.99994, and the negative likelihood ratio (LR−) was 0.583333 for high risk.The average classification error in this configuration was the highest, at 26.5%.
With the addition of 6 features, an increase was observed in the AUC to 0.937 for high risk and 0.777 for intermediate risk, along with a decrease in the average classification error to 23.9%.This increase indicates that adding features provides additional benefits to the discriminative performance of the model.LR+ increased to 6.599, indicating improved predictive performance of the model for positive classification, while LR− decreased to 0.523, indicating improvement in the model ability to exclude high risk in negative results.
In the configuration using 7 features, a significant increase was observed in the AUC for high risk to 0.961, indicating excellent model performance.Meanwhile, the AUC for intermediate risk also increased to 0.780 and low risk increased to 0.917.LR+ for high risk significantly increased to 400,000.3, indicating substantial improvement in predicting high-risk positives.Although LR− experienced a slight increase to 0.533, the average classification error decreased to 21.7%, reflecting increased predictive accuracy.
The configuration with eight features showed a slight decrease in the AUC for high and intermediate risks.However, the AUC for low risk increased slightly to 0.927.LR+ remained high at 400,000.9, maintaining substantial predictive strength for positive classification, while LR− showed a remarkable decrease to 0.312, indicating improved performance in excluding high risk in negative results.The average classification error remained stable at 22.5%, indicating consistent predictive accuracy with adding the eighth feature.
In our experiment, we observed that using nine features shows optimal classification performance.Table 3 shows that the model Feature importance ranking based on the SHapley Additive exPlanation (SHAP) values from explainable artificial intelligence (XAI) with the ANN, which was optimized by GS hyperparameter optimization.194 (0.1973-0.1975)Frontiers in Physiology frontiersin.orgvalue analysis.Figure 4 shows qInward as the top-ranked feature, contributing significantly to the performance of the ANN classifier model.qInward has the most substantial effect on classification, particularly in the high-risk category due to alterations in the electronic charge within the INaL and ICaL ion channels of the heart.These changes significantly impact cardiac ion currents and serve as primary triggers for TdP occurrence.In high-risk drugs such as quinidine, bepridil, sotalol, and dofetilide, qInward demonstrates rapid and concentration-dependent increases.Higher drug doses correspond to faster and larger increments in qInward values (Li et al., 2017).Consistent with our findings, prior research has also highlighted the dominant role of qInward variability in convolutional neural network (CNN) classifiers for TdP risk classification (Jeong et al., 2022).A significant disparity in performance is observed between the qNet model proposed by Li et al. (2019) and the qNet model using ToR-ORd in silico model.This discrepancy can be ascribed to striking structural differences in the hERG current between the ORd and ToR-ORd in silico models.Specifically, the ORd in silico model integrates additional components to depict pharmacodynamic effects, while such components are absent in the ToR-ORd in silico model (Li et al., 2017;Tomek et al., 2019).Tomek et al. (2019) reported that the ToR-ORd in silico model remains relevant and effective in simulating human cardiac electrophysiological responses.This model generates data that are well aligned with experimental data, indicating good concordance even without specific optimization.The ToR-ORd in silico model successfully predicted responses to various ion channel blockades, including IKr (E-4031), IKs (HMR-1556), multichannel mexiletine blockade, and ICaL (nisoldipine), which is consistent with experimental observations.Subsequently, the ranking of SHAP values for each feature in the classification contribution of each class is followed by CaD 50 , CaD 90 , APD 50 , APD 90 , Vm resting , Ca resting , and dV dt max .Previous studies, including works by Lancaster and Sobie (2016), Li et al. (2019), Yunendah et al. (2023), and Yoo et al. (2021), utilized the significance of these features in classifying TdP risk.However, our analysis shows that Ca resting , Vm resting , and dV dt max provided minimal contributions, as shown in Figure 4. Consequently, Ca resting , Vm resting , and dV dt max have a limited impact on determining TdP risk in the utilized model.
Based on the SHAP analysis results, we grouped the feature set into five groups.Among those groups, utilizing nine features demonstrated superior performance, as shown in Table 3.Our classification performance evaluation was based on diagnostic accuracy metrics (Li et al., 2019), where "excellent" was achieved if AUC ≥0.9, "good" if AUC ≥0.8, and "minimal acceptance" if AUC ≥0.7.Based on Table 3, we observed an intriguing pattern regarding predictive performance as the number of features used varied.Beginning with five features, our model exhibited adequate performance with a sufficiently strong AUC for both low and high risks, even so with a decreased effectiveness for intermediate risk.The sixth feature group improves model performance, indicating classification stability for intermediate and low risks.However, adding the seventh feature did not consistently improve the performance, especially in the high-risk category, suggesting that some additional features may not provide significant predictive information.Transitioning to eight features did not yield significant performance changes, indicating no substantial improvement in accuracy or AUC.Compared to other feature groups, consistently high AUC performance is observed for the high-risk class and low-risk class.Meanwhile, the intermediate risks show performance stability.Feature reduction does not significantly affect TdP risk classification performance; selecting nine features provides superior and comprehensive performance in our predictive model.
Several limitations in this study may require further investigation.First, while a single ToR-ORd model yields relatively high AUC performance, integrating population models remains crucial.Incorporating diverse ventricular APs in in silico models can better represent the variability of cellular responses to drugs (Tomek et al., 2019).Therefore, applying these findings to a broader population or different drugs necessitates further research.Additionally, drug calibration is necessary to validate arrhythmia prediction models to ensure alignment with actual drug conditions (Han et al., 2020).

Conclusion and future works
This study evaluated nine TdP metrics from the ToR-ORd in silico ventricular cell model in predicting TdP risk.Initially, we conducted training incorporating the training dataset into an ANN to determine optimum hyperparameters manually through GS.We trained the optimized ANN model with 9 metrics from 12 training data.The optimal ANN model was then tested 10,000 times with 16 drugs that had not been previously used.Next, we analyzed the contribution of nine features, dV dt max , Vm resting , APD 90 , APD 50 , Ca resting , CaD 90 , CaD 50 , qNet, and qInward using SHAP value analysis on the ANN model.The SHAP value analysis showed that qInward was the feature that contributed most dominantly to the model.However, after analyzing with single qInward using ordinal logistic regression, qInward provided less satisfactory performance.Furthermore, we conducted an analysis using all nine features and a reduction of features based on the highest and lowest contribution orders from SHAP value results.Based on our findings, reducing features did not significantly impact TdP risk classification performance.The utilization of all nine TdP features with the ANN yielded the highest performance, i.e., an AUC of 0.979 for the high-risk category, 0.791 for the intermediate-risk category, and 0.937 for the low-risk category.These findings contribute to the understanding of utilizing nine features using the ToR-ORd in silico cell model.However, further research should consider the development of more complex in silico models that encompass a larger population with diverse potential actions of human ventricular and drug calibration to contribute to more significant proarrhythmic risk prediction.
writing-review and editing.KL: conceptualization, methodology, supervision, validation, and writing-review and editing.

FIGURE 2
FIGURE 2Schematic diagram of the 10,000 test algorithms based on the the study byJeong et al. (2022).

FIGURE 3 ANN
FIGURE 3 ANN architecture optimized by grid search (GS) hyperparameter optimization.

TABLE 2
Single feature performance evaluation.Model performance evaluation after 10,000 iterations.The features were grouped according to SHAP values, as illustrated in Figure4.The feature with the lowest ranking in the SHAP value analysis was eliminated first.Proarrhythmic risk is classified as high, intermediate, and low.The Area Under the Curve (AUC), Likelihood Ratio Positive (LR+), Likelihood Ratio Negative (LR−), and mean classification error were utilized for model evaluation.