Efficient Numerical Solution of the EMI Model Representing the Extracellular Space (E), Cell Membrane (M) and Intracellular Space (I) of a Collection of Cardiac Cells

The EMI model represents excitable cells in a more accurate manner than traditional homogenized models at the price of increased computational complexity. The increased complexity of solving the EMI model stems from a significant increase in the number of computational nodes and from the form of the linear systems that need to be solved. Here, we will show that the latter problem can be solved by careful use of operator splitting of the spatially coupled equations. By using this method, the linear systems can be broken into sub-problems that are of the classical type of linear, elliptic boundary value problems. Therefore, the vast collection of methods for solving linear, elliptic partial differential equations can be used. We demonstrate that this enables us to solve the systems using shared-memory parallel computers. The computing time scales perfectly with the number of physical cells. For a collection of 512 × 256 cells, we solved linear systems with about 2.5 × 10 8 unknows. Since the computational effort scales linearly with the number of physical cells, we believe that larger computers can be used to simulate millions of excitable cells and thus allow careful analysis of physiological systems of great importance.


INTRODUCTION
Traditionally, numerical simulations of excitable tissue have been performed using homogenized mathematical models. For cardiac tissue, the bidomain model and the associated monodomain model have been very popular and provided important insights into the electrochemical processes underpinning every heartbeat. Introductions to these models can be found in, e.g., Refs. 1 and 2 and numerical methods used to solved the models are presented in numerous papers; see, e.g., Refs. 3-7. In the homogenized models, the extracellular space, the cell membrane and the intracellular space all exist everywhere in the computational domain. This approach has proven to be very useful in studying effects on a relatively large length scale (millimeters), but has obvious limitations when the length scale approaches the size of a cell (micrometers). The main reason for this is that the cell itself is missing in the homogenized models.
As an alternative to the homogenized models, we have recently presented and applied a cell-based model; see, e.g., Refs. 8-10. In this model, the extracellular domain (E), the cell membrane (M) and the intracellular domain (I) are all present in the mathematical model and it is therefore referred to as the EMI model. The EMI model is motivated by earlier work where homogenization has been avoided; see, e.g., Refs. 11-19. The obvious advantage of the EMI approach over the homogenized counterparts is that the cell itself is present in the model. This enables the modeler to change cell membrane properties locally (e.g., varying ion channel densities along the cell membrane) which may be of importance for the conduction velocity; see, e.g., Refs. 9, 20, and 21. Furthermore, the presence of the cell allows for study of the effect of changes to the cell size and shape which may be of great importance; see, e.g., Refs. 22-25. For instance, the T-tubule system can only be represented in an averaged manner in the classical bidomain/monodomain models, but in the EMI model it would be possible to represent the T-tubule system explicitly albeit at the cost of handling complex geometries.
On the other hand, the obvious disadvantage of the EMI approach is the computational complexity of the model. One issue is that in order to represent the geometry of the cell and the intercalated discs to a reasonable degree of accuracy, we need a very fine computational mesh. For the finite difference computations we have conducted, the mesh resolution is usually between 1 and 4 μm. Using a finite element method, the number of nodes may be reduced by applying an adaptive mesh, but the number of grid points remains high. For the bidomain/monodomain models a typical mesh resolution is around 0.25 mm (see, e.g., Ref. 26). For a cube with sides measuring 1 cm, the EMI model (with Δx 4 μm) requires about 1.6 × 10 10 mesh points whereas the bidomain model requires about 64,000 mesh points (with Δx 0.25 mm). Since the volume of the human heart muscle is about 300 cm 3 , we realize that the EMI approach is currently not feasible and can only be used to study relatively small collections of cells. For the human heart, the bidomain model will require about 20 million computational nodes, and that is clearly within reach for even moderately sized computers.
The EMI model is useful for gaining detailed insights into the dynamics close to excitable cells. It may also be used to study small collections of specialized cells like the sinus node, AV node (atrioventricular node), Purkinje junctions, etc. For instance, the volume of the sinus node (see Ref. 27) is about 80 mm 3 and a mesh with Δx 4 μm would result in about 1.25 × 10 9 mesh points which may be achievable. In contrast, the bidomain model of the sinus node would need only about 5,000 nodes.
The mouse heart is frequently used as an experimental model to study cardiac electrophysiology. The size of the sinus node and AV node for this animal is much smaller than for humans; the volume of the sinus node of a mouse is about 1.5 mm × 0.5 mm × 0.3 mm ≈ 0.23 mm 3 and the volume of the AV node is about 1.2 mm × 0.5 mm × 0.4 mm ≈ 0.24 mm 3 , see Ref. 28. With Δx 4 μm, we would need to solve linear systems with about 3.5 × 10 6 and 3.75 × 10 6 computational nodes for the sinus node and the AV node, respectively. Based on the results of the present paper, we claim that it is possible to perform detailed simulations based on the EMI model for the sinus node and AV node of the mouse heart.
Another difficulty associated with the EMI model is the characteristics of the linear system arising from the discretization of the model. The system turns out to be difficult to solve and it does not fall into a standard category of systems where well-developed methods and software can be applied. The method used to solve the system in Refs. 8 and 10 is based on an temporal operator splitting algorithm where the nonlinear part is handled in one step, and then the linear equations arising from E, M and I are spatially coupled in one common linear system. Attempts to split the system into separate equations for E, M and I, and solve these systems in a sequential manner, lead to severe restrictions on the time step (Δt ∼ 10 − 5 ms), rendering the method useless for any practical purposes. Therefore, we need to formulate the model as a coupled system of equations and the challenge becomes how to solve this system. Fast numerical solution of linear, elliptic equations is one of the most developed field of scientific computing (see, e.g., Refs. [29][30][31][32]. Therefore, spatial splitting of the problem into subproblem of this particular type will ensure that we have efficient methods at hand. The main result of this paper is to demonstrate that the EMI model can be split into sub-problems of the classical elliptic type and therefore the overall method becomes efficient and stable. The overall computational efforts scale linearly with the number of cells and, therefore, the number of cells we can simulate is, in principle, determined by the power of the available computing facility.
The remainder of this paper is organized as follows: First, in Section 2, we present the EMI model for a single cell and a finite difference approach for discretizing and solving the linear model equations in a coupled manner. We then describe the spatial operator splitting algorithm for splitting the EMI model into separate equations for the extracellular space, the membrane and the intracellular space, and outline a finite difference discretization for the splitting approach. In Section 2.5, we investigate the accuracy of the spatial splitting method by comparing the solution of the splitting algorithm to the solution of the spatially coupled system of equations. Next, in Section 3, we present the EMI model for connected cells and explain how the spatial splitting approach can be extended to this case. We also describe a finite difference discretization of the coupled system and the spatial splitting approach, and investigate the accuracy of the splitting method for a collection of connected cells. In Section 4, we describe a parallel shared-memory implementation of the spatial splitting algorithm and report performance results of the parallel solver. Finally, in Sections 5 and 6 the results of the paper are discussed and summarized.

THE SINGLE CELL CASE
We begin by describing a spatial splitting method for the EMI model for a single cell surrounded by an extracellular space, as illustrated in Figure 1. This splitting method may also be straightforwardly extended to collections of isolated cells, as illustrated in Figure 2. However, when collections of cells are connected by gap junctions (see Figure 6), the splitting method must be extended to account for the gap junctional coupling between cells. This extension is described in Section 3.

The EMI Model
In the EMI model for a single cell, like illustrated in Figure 1, we consider an intracellular domain, Ω i , surrounded by an extracellular domain, Ω e . The boundary between the intracellular and extracellular domains defines the cell membrane, Γ. The EMI model (see, e.g., Ref. 8) describes the electric potential in these domains and is given by the following system of equations: u e 0 a t zΩ D e , (3) n e · σ e ∇u e 0 a t zΩ N e , (4) n e · σ e ∇u e −n i · σ i ∇u i ≡ I m at Γ, Here, u e is the extracellular potential defined in Ω e , u i is the intracellular potential defined in Ω i , and v is the membrane potential defined at Γ. The potentials are all given in units of mV. The parameters σ e and σ i represent the extracellular and intracellular conductivites, respectively, and are given in units of mS/cm. Furthermore, n e and n i denote the outward pointing unit normal vectors of the extracellular and intracellular spaces, respectively, and C m represents the specific membrane capacitance, given in units of μF/cm 2 . Time is given in units of ms and length is given in units of cm. The current density I ion represents the sum of the ionic current densities through different types of ion channels, pumps and exchangers on the cell membrane, and I m represents the sum I ion and the capacitive current density C m v t . All current densities are given in units of μA/cm 2 . The model for I ion typically involves a set of additional state variables describing the gating mechanisms of the ion channels and ionic concentrations. These variables are defined on Γ and are denoted by s in the EMI system defined above. The dynamics of s are governed by a set of ordinary differential equations described by F(v, s). In the computations of this paper, we let I ion and F(v, s) be given by the Grandi et al. model for human ventricular cardiomyocytes [33].
On the outer boundary of the extracellular space, we apply the homogeneous Dirichlet boundary condition Eq. 3 on a part of the boundary, zΩ D e , and the homogeneous Neumann boundary condition Eq. 4 on the remaining part, zΩ N e . In the computations reported below, we apply Dirichlet boundary conditions on the outer extracellular boundary in the x-and y-directions and Neumann boundary conditions on the extracellular boundary in the z-direction.

The Coupled Discrete Model
In order to illustrate how the linear part of the EMI model Eqs 1-8 can be discretized and solved in a coupled manner, we describe below how to set up a finite difference discretization of the full EMI system. However, because of the possibly nonlinear terms included in F and I ion , we use a temporal operator splitting method to split the EMI system into a linear part and a nonlinear part. The discrete version of this approach is taken from Ref. 8 but repeated here for completeness.

Spatial and Temporal Splitting
The challenge we address in the present paper is to split the spatial coupling of the EMI model such that a solution algorithm can be founded on well-studied equations. In addition, we use temporal splitting of Eq. 7.

Step 1: Membrane Dynamics
In the discrete model, we seek discrete approximations u n,j,q,r i , u n,j,q,r e , v n,j,q,r , s n,j,q,r of the solutions at t t n nΔt, x x j jΔx, y y q qΔy, and z z r rΔz. It is important to notice that the approximations of v and s are only sought on Γ, the approximation of u i in Ω i , and that of u e in Ω e . For each time step t n , we assume that the solutions v n−1 and s n−1 are known for the previous time step (t t n−1 ). In the first step of the temporal FIGURE 2 | Illustration of the EMI model domain for a collection of isolated cells. The spatial splitting approach described in Section 2.2 for a single cell may also be applied for this type of cell collection. operator splitting procedure, we update the solution of v and s by solving for a time step of Δt from the previous solutions v n−1 and s n−1 .
We let the solution of s from this step define the approximation of s at the current time step, s n , and let the solution of v be denoted by v. In the computations reported below, we solve Eqs 9 and 10 by taking m forward Euler steps of size Δt * Δt/m. We found the forward Euler scheme to produce stable solutions at Δt * 0.001 ms with the Grandi et al. model. If a stiffer cell model was used, a first-order generalized Rush-Larsen scheme [34] could be a more fitting choice, as it generally allows for a larger time step than forward Euler [35].

Step 2: Coupled Linear EMI System
In the next step of the temporal operator splitting procedure, the remaining EMI system Eqs 1-7 with I ion 0 is solved using a coupled finite difference discretization. A two-dimensional version of the computational grid used in this discretization is illustrated in Figure 3A. We observe that the computational grid consists of three types of nodes; extracellular nodes marked by ×, intracellular nodes marked by ○, and membrane nodes marked by ⊗.
For the intracellular nodes, there is one unknown, u i , and one governing equation (Eq. 2). This equation is discretized using the finite difference approximation n,j,q,r i + u n,j,q,r−1 i 0.
Likewise, for the majority of the extracellular nodes, there is one unknown, u e , and one governing equation (Eq. 1), which is discretized using Eq. 11 with σ i and u i replaced by σ e and u e , respectively. For the extracellular nodes on zΩ D e , Eq. 11 is replaced by the discrete version of Eq. 3 (u n,j,q,r e 0). Furthermore, for extracellular nodes on zΩ N e (i.e., for the boundary in the z-direction), the finite difference approximation u n,j,q,r+1 e − u n,j,q,r−1 e 2Δx 0 (12) of Eq. 4 is used to find substitutions for either u n,j,q,r+1 e or u n,j,q,r−1 i located outside the computational domain in the extracellular version of Eq. 11. For the nodes located at the membrane, there are three unknowns, u i , u e , v, and three governing equations (Eqs 5-7). For the flux equality equation, Eq. 5, we use a standard first-order finite difference approximation, and for Eq. 6, we use the discrete version v n,j,q,r u n,j,q,r i − u n,j,q,r e . For Eq. 7 with I ion 0, we use an implicit time discretization v n,j,q,r − v j,q,r Δt where I n,j,q,r m is computed using the same finite differences as for Eq. 5 and v j,q,r is the solution of v from Step 1 of the operator splitting procedure. Note that for membrane nodes located at the lines where two membrane planes intersect, we compute two versions of I n,j,q,r m , one for the normal derivative of each of the membrane planes and define I n,j,q,r m as the mean value of these two versions. For corner nodes where three membrane planes intersect, we similarly compute three versions of I n,j,q,r m and define I n,j,q,r m as the mean value of these three versions.
Collecting the equations of the discrete version of the linear part of the EMI model, we end up with a spatially coupled system of equations consisting of one equation for every computational node plus two extra equations for each membrane node.

Robin/Neumann Spatial Splitting of the EMI Model
The coupled discrete version of the EMI model may be used to find accurate solutions of the system (see, e.g., . However, in some cases it would be much more convenient to be able to rewrite the EMI system as a set of more standard problems that can be solved individually using existing well- developed solution methods for such standard problems. Linear systems arising from discretizations of elliptic partial differential equations have been under intense scrutiny for decades and the efforts have resulted in optimal solution strategies; see, e.g., Ref. 30. Our aim is therefore to introduce a spatial operator splitting algorithm for the EMI model that results in subproblems formulated in terms of standard elliptic problems, thus enabling the use of optimal solvers. In this section, we describe a method for constructing such a splitting of the EMI model. In this approach, for each time step, we will solve the equations of the intracellular and extracellular spaces separately as standard Laplace equations with Robin, Neumann or Dirichlet boundary conditions on the boundary of the domains. The algorithm described in this section is summarized in Algorithm 1.

Step 1: Membrane Dynamics
In the first step of the splitting procedure, we solve the nonlinear part of the EMI model just like in the coupled discrete model. In other words, we update the solution of the membrane potential, v, and the remaining membrane state variables, s, by solving the system of ordinary differential equations at Γ for one time step with the solutions from the previous time step v n−1 , s n−1 as initial conditions. We let the solution of s from this step define the approximation of s at the current time step, s n , and let the solution of v be denoted by v.

Step 2: Intracellular System
In the next step of the approach, we find an approximation of the intracellular potential, u n i , by solving Eq. 2 and using Eqs. 6, 7 (with I ion 0 and initial condition v) as boundary conditions for the membrane. In Eq. 6, we let u n e be approximated by u e computed as follows: In the first iteration of Step 2, we let u e u n−1 e , and in the first time step, we assume that at t 0, we have u 0 e 0. For the next iterations, the approximation u e is taken from the solution of the previous iteration of the splitting algorithm. Inserting the approximation of u n e into Eq. 6, we get In Eq. 7, we discretize the time derivative using the implicit approximation Inserting Eq. 16 yields and inserting the definition of I m from Eq. 5, we obtain This defines a Robin boundary condition at Γ for the intracellular domain. Summarizing, Step 2 of the splitting procedure consists of solving to find an approximation of u n i , denoted by u i .

Step 3: Extracellular System
In the third step of the procedure, we update the extracellular potential by solving Eq. 1 with boundary conditions given by Eqs 3-5. Here, n i · σ i ∇u i in Eq. 5 is found from the solution of Step 2.
In other words, Step 3 consists of computing from the solution of u i from Step 2, and then solving n e · σ e ∇u e 0 at zΩ N e , n e · σ e ∇u e I m at Γ, to find an approximation u e of u n e .

Iteration for the Intracellular-Extracellular Coupling
After solving Step 2 and Step 3 once, we obtain a new approximation u e for u n e . This approximation may be used to define a new approximation given by Eq. 16, and Step 2 and Step 3 may be repeated to obtain more accurate estimates of u i and u e for u n i and u n e , respectively. After repeating these steps for a number of iterations, N it , we define

Step 4: Updating the Membrane Potential
In the final step of the procedure, the membrane potential, v, is updated using the solutions of u i and u e from the final iteration of Step 2 and Step 3. We use Eq. 6 and set v n u n i − u n e at Γ.

Discrete Version of the Splitting Approach
An illustration of the computational mesh used in each of the steps of the finite difference discretization of the splitting method is found in Figure 3B. In the discrete version of the splitting approach, we solve Step 1 just like for the coupled version by taking m forward Euler steps of size Δt * Δt/m. In Step 2, we consider the nodes representing the membrane and the purely intracellular points and we have one equation and one unknown, u i , in all these points. For the purely intracellular points, we discretize Eq. 17 using Eq. 11. For the membrane points, we discretize n i · σ i ∇u i in Eq. 18 using a standard first-order finite difference approximation. In Step 3, we discretize Eqs 20-22 in the same manner as for the coupled system, and use standard first-order finite difference approximations of n i · σ i ∇u i and n e · σ e ∇u e in Eqs 19 and 23, respectively. In Step 4, we use the straightforward discrete version of Eq. 26.

Numerical Comparison of Solution Methods
In order to investigate the accuracy of the splitting method introduced above, we set up a simple test case with a single cell surrounded by an extracellular space. The cell is located in the center of the domain, and the size of the entire domain Ω Ω i ∪Ω e is 180 μm × 130 μm × 26 μm. The default parameter values used in the simulation are given in Table 1, and the membrane dynamics are modeled by the epicardial version of the Grandi et al. model for human ventricular cardiomyocytes [33]. In addition, 20% of the cell in the lower x-direction is stimulated by a membrane current of −40 μA/cm 2 . Figures 4 and 5 show a comparison of the solutions of the coupled approach and the splitting approach applied to this simple test problem. The left panel shows the extracellular potential, u e , and the right panel shows the intracellular potential, u i . Furthermore, the upper panel shows the solution of the coupled approach, the middle panel shows the solution of the splitting approach, and the lower panel shows the absolute value of the difference between the two solutions. In Figure 4 we use one iteration (N it 1) in the splitting approach, and in Figure 5, we use five iterations (N it 5). We observe that the coupled and splitting approaches result in very similar solutions for both u e and u i and for both one and five iterations of the splitting approach. However, the difference is reduced when five iterations are used compared to when one iteration is used. For N it 1, the maximum difference between the two solutions is about 10 − 6 mV for u e and 10 − 2 mV for u i . For N it 5, the maximum difference is reduced to about 10 − 14 mV for u e and 10 − 10 mV for u i .
In Table 2, we report the difference between the solutions of the two approaches in the form of the relative, discrete l 2 -norm of the difference. We consider a number of different values of Δt for the splitting approach and compare these solutions to the solution of the coupled approach for Δt 0.001 ms. In the splitting approach, we take five iterations for each time step (N it 5). In the table, we observe that as Δt is reduced for the splitting approach, the difference between the two solutions decreases with a rate roughly equal to Δt.

CONNECTED CELLS
As observed in the previous section, the proposed splitting method for the EMI model for a single cell appears to provide accurate solutions for reasonable time step sizes. However, in cases where cells are connected by gap junctions like illustrated in Figure 6, the splitting approach needs to be extended to account for the coupling between cells through the gap junctions. Again, the aim of the splitting is to obtain standard uncoupled Laplace problems in each domain and thereby allow solution of the originally coupled system by solving decoupled systems for the individual cells in parallel. In this section, we describe such an extension of the splitting method.

The EMI Model for Connected Cells
A two-dimensional version of the domain for the EMI model for two connected cells is illustrated in Figure 6. Two cells, Ω 1 i and Ω 2 i , are surrounded by an extracellular space, Ω e , and the boundary between Ω e and each of the cells Ω 1 i and Ω 2 i consists of the cell membranes denoted by Γ 1 and Γ 2 , respectively. In addition, the boundary between the two intracellular domains defines an intercalated disc, denoted by Γ 1,2 . The EMI model describes the electric potential in such a domain and is given by the equations n e · σ e ∇u e 0 a tzΩ N e , Algorithm 1 | Summary of the splitting algorithm for the EMI model for a single cell.
Initial conditions: v 0 , s 0 , u 0 e . for n 1, . . . , N t : Step 1: Find s n and v by solving a time step Δt from ( Define u e u n−1 e . for j 1, . . . , N it : Step 2: Find u i by solving Step 3: Find u e by solving ∇ · σ e ∇u e 0 i nΩ e , u e 0 a tzΩ D e , n e · σ e ∇u e 0 a tzΩ N e , n e · σ e ∇u e −n i · σ i ∇u i at Γ.
for each cell k and each neighboring cellk. Here, u e , u k i , v k are the extracellular, intracellular and membrane potentials in Ω e , Ω k i and at Γ k , respectively. In addition, w k u k i − uk i denotes the potential difference at an intercalated disc, Γ k,k . Note here that at a given intercalated disc, this potential difference is defined differently for the two adjoining cells. For instance, at the intercalated disc Γ 1,2 illustrated in Figure 6, The current density I k gap (in μA/cm 2 ) represents the ionic current density through the gap junctions at the intercalated disc and is given by where R g is the gap junction resistance (in kΩcm 2 ), and C g in Eq. 37 represents the specific capacitance of the intercalated disc.
Furthermore, I k,k (in μA/cm 2 ) represents the sum of I k gap and the capacitive current over the gap junction, C g w k t . For definitions and units for the remaining variables and parameters, see Section 2.1.

Discrete Coupled Version of the Model for Connected Cells
A two-dimensional version of the computational mesh used in the finite difference discretization of the EMI model for two connected cells is illustrated in Figure 7A. We observe that in this case, there are four different types of nodes; nodes located in the extracellular space, marked by ×, nodes located in the intracellular space, marked by ○, nodes located at the membrane, marked by ⊗, and, finally, nodes located at the intercalated disc, marked by •.
In the coupled discrete version of the EMI model for connected cells, we split the EMI model into a nonlinear part and a linear part using the same temporal operator splitting procedure as for the discrete coupled model for a single cell (see Section 2.2). Moreover, the first nonlinear step, solving the ordinary differential equations describing the membrane dynamics, is solved in exactly the same manner as for the single cell case (see Section 2.2.2).
Furthermore, in the finite difference discretization of the linear part of the EMI model we use the exact same approach to  The only exceptions requiring special attention in the case of connected cells are the extracellular nodes located directly below or above the intercalated disc (see Figure 7A). For these nodes, we enforce a no-flow boundary condition and set the extracellular potential equal to the potential in the purely extracellular nodes located directly above or below the corner nodes.
The nodes located at the intercalated disc are treated in exactly the same manner as the membrane nodes except that I k gap is not set to zero in Eq. 37. We discretize nk i · σ i ∇uk i and n k i · σ i ∇u k i in Eq. 35 using standard first-order finite differences, and use a straightforward discrete version of Eq. 36. Finally, Eq. 37 is discretized in time using an implicit finite difference approximation   The solutions are compared at t 3.5 ms in a simulation of a single cell, and the parameters of the model are specified in Table 1.
Frontiers in Physics | www.frontiersin.org January 2021 | Volume 8 | Article 579461 w k,n − w k,n−1 Δt where a first-order finite difference approximation of −n k i ·σ i ∇u k,n i is used to define I n k,k .

Splitting Approach for the EMI Model for Connected Cells
We now want to split the intracellular subdomains in order to obtain classical elliptic equations defined on small domains. In this splitting approach, the linear part of the EMI model is for each time step split into standard Laplace problems with Robin, Neumann or Dirichlet boundary conditions. These problems can be solved separately in each of the domains Ω k i and Ω e . Steps 1, 3, and 4 of the splitting approach are exactly the same for connected cells as for the single cell case (see Section 2.3), but Step 2 is extended to account for the cell coupling at the intercalated discs, Γ k,k . In this section we therefore only describe the extended version of Step 2 and refer to Section 2.3 for descriptions of the remaining steps of the procedure. In addition, the full splitting algorithm for coupled cells is summarized in Algorithm 2.

Step 2: Intracellular System
In Step 2 of the splitting approach for connected cells, we update the intracellular potential by solving Eq. 28 in each of the intracellular domains, Ω k i . On the membrane, Γ k , we define Robin boundary conditions from Eqs 32 and 33, just like in the single cell case (see Section 2.3.2). What remains is to define a boundary condition for the intercalated disc, Γ k,k , from Eqs 35-37.
We define this boundary condition in an iterative manner, where we for each iteration use an approximation of w k,n , denoted by w k . In the first iteration, we set w k w k,n−1 . Discretizing the time derivative in Eq. 37 using an implicit finite difference and inserting Eq. 38, we obtain I n k,k 1 R g w k,n + C g w k,n − w k,n−1 Δt .
Replacing w k,n by the approximation w k , we get This defines an approximation of the Neumann boundary condition (Eq. 35) for Ω k i . In each iteration of the intracellular system, we therefore solve for each cell k to find temporary approximations of u k,n i , denoted by u k i . 1 Finally, we update w k using Eq. 36 on the form This procedure (Eqs 39-42) is continued for a number of iterations, denoted by M it .

4). For each cell and in each iteration of
Step 2, we consider the nodes representing the intercalated disc, the membrane and the purely intracellular points, and we have one equation and one unknown, u k i , in all these nodes. For the purely intracellular nodes, we discretize Eq. 39 using Eq. 11. For the membrane and intercalated disc nodes, we discretize n i · σ i ∇u k i in Eqs 40 and 41 using a standard first-order finite difference.

Numerical Comparison of Solution Methods
In order to investigate the accuracy of the splitting method for the EMI model for connected cells, we extend the simple test case introduced in Section 2.5 to include a collection of 5 × 5 connected cells. We still use the parameters specified in Table 1, unless otherwise specified. In order to allow for two-dimensional cell connections, each cell is extended with additional cubes of smaller width in the x-and y-directions (see Figure 8). The sizes of these additional cubes are given in Table 3. The distance between the intracellular domain and the extracellular boundary is 12 μm in the x-and y-directions and 4 μm in the z-direction. Furthermore, we stimulate the 2 × 2 cells in the lower left corner by a 1 ms-long stimulus current of −40 μA/cm 2 , and we consider the solution at t 2.5 ms.
Note that in the current implementation of the model, the intracellular and extracellular conductivities, σ i and σ e , are assumed to be isotropic, and the anisotropy of cardiac tissue is represented by the anisotropic cell shape and distribution of gap junctions. Anisotropic conductivities, σ i and σ e , could, however,   Figure 8).

Domain
Size Algorithm 2 | Summary of the splitting algorithm for the EMI model for connected cells.
Initial conditions: v k,0 , s k,0 , w k,0 , u 0 e , for all k. for n 1, . . . , N t : Step 1: For all k, find s k,n and v k at the nodes of the membrane Γ k by solving a time step Δt from (s k,n−1 , v k,n−1 ) of v k Define u e u n−1 e , w k w k,n−1 for all k. for j 1, . . . , N it : Step 2: for m 1, . . . , M it : For every k, find u k i by solving wherek denotes each of the neighboring cells of cell k. Update w k u k i − uk i at Γ k,k for all k andk. end Step 3: Find u e by solving ∇ · σ e ∇u e 0 i nΩ e , u e 0 a tzΩ D e , n e · σ e ∇u e 0 a tzΩ N e , n e · σ e ∇u e −n k i · σ i ∇u k i at Γ k for all k. end Define u n e u e , u k,n i u k i , w k,n w k for all k.
Step 4: Define v k,n u k,n i − u n e at Γ k for all k. end also be straightforwardly incorporated in the model (see . Using the parameters of the simple test case setup, the conduction velocity is approximately 26 cm/s in the longitudinal direction (x-direction) and approximately 6 cm/s in the transverse direction (y-direction).
In Figure 9, we compare the solutions of the coupled and splitting approaches applied to this test problem. In the splitting approach, we use five inner and outer iterations (M it N it 5). We observe that the coupled and splitting approaches result in very similar solutions for both u e and u i . The maximum difference between the two solutions is about 10 − 8 mV for u e and 10 − 6 mV for u i . Table 4 compares the solution of the splitting approach for different values of Δt to the solution of the coupled approach using Δt 0.001 ms. Again, we use five inner and outer iterations (M it N it 5) in the splitting approach. We observe that as the size of Δt is decreased for the splitting approach, the difference between the two solutions decreases with a rate roughly equal to Δt. Note that in this case, Δt 0.1 ms lead to instabilities for the splitting approach, so Δt 0.05 ms is the smallest time step considered.
In Figure 10, we investigate how the absolute value of the difference between the intracellular potentials, computed using the two solution methods, depends on the time step, Δt, the number of iterations used for the coupling between the intracellular and extracellular domains, N it , and the number of iterations used for   The solutions are compared at t 2.5 ms in a simulation of 5 × 5 connected cells, and the parameters of the model are specified in Tables 1 and 3. the coupling between the interior of the cells, M it . In this case, we use the same Δt in the coupled and splitting approaches. In the first row of Figure 10, we observe that when only one inner and outer iteration is used in the splitting approach, the maximum difference between the solutions is in the range of 1-10 mV, and that the difference decreases as the time step is decreased. In the next rows, we observe that the difference between the solutions of the two methods decreases as we increase the number of iterations used in the splitting approach. For Δt 0.05 ms, we see that the maximum difference is decreased from about 8 mV to about 1.5 mV if N it or M it is increased to five. Moreover, if both N it and M it is set to five, the maximum difference decreases to less than 0.01 mV. For Δt 0.01 ms and Δt 0.001 ms, we observe that the difference decreases more extensively when the number of outer iterations, N it , is increased to five than when the number of inner iterations, M it , is increased to five. Furthermore, when both are set to five the maximum difference is about 10 − 6 for Δt 0.01 ms and 10 − 4 for Δt 0.001 ms. In order to get a better impression of the differences between the solutions, the lower panel zooms in on the solution at the action potential peak. In the left panel, we observe that when only one iteration is used in the splitting approach (N it M it 1), there are clearly visible differences between the solutions of the splitting approach and the coupled approach. However, in the next panels, we observe that when the number of iterations is increased, the difference between the solutions decreases, and for N it M it 5, the solutions of the coupled and splitting approaches are completely indistinguishable in the plots. In Figure 12, we examine how the maximum difference in the computed intracellular potentials between the coupled and splitting approaches decreases as the number of iterations used in the splitting approach increases. In the left panel, we fix the number of inner iterations, M it , at one or five and vary the number of outer iterations, N it , between one and eight. We observe that the difference between the solutions of the two methods decreases as the number of outer iterations, N it , increases. In addition, the difference is smaller if the number of inner iterations, M it , is five than if it is one. In the right panel, we fix the number of outer iterations, N it , at one or five and vary the number of inner iterations, M it , between one and eight. In this case we observe that increasing M it above two or three when N it is fixed at one or five, respectively, does not seem to increase the accuracy of the splitting approach.

SHARED-MEMORY PARALLEL IMPLEMENTATION OF THE SPLITTING ALGORITHM
In this section, we will describe a parallel implementation of the splitting algorithm. The parallel implementation targets a hardware platform with a shared-memory system. Such parallel platforms are currently widely available, e.g., a server or desktop computer that has multiple sockets each with tens of  CPU cores. The entire memory is composed of the memory domains that are directly connected to the sockets, allowing each CPU core to access any address in the whole memory space. The shared address space considerably eases the parallelization. However, care in programming is needed to maximize data locality, because accessing non-local parts of the memory is more costly than accessing the local part.
Important details of programming and performance enhancement will be discussed below, separately for the three decoupled numerical components of the splitting algorithm. An underlying principle is to use, whenever possible, existing software libraries for the different components. Such software libraries either use multiple threads to parallelize the computation internally, or are serially invoked inside a loop of parallel iterations.

The Intracellular Subdomains
Recall that the intracellular space consists of the individual excitable cells. Due to the splitting algorithm, the intracellular systems associated with the cells can be solved independently. This immediately brings a cell-level parallelism, because the individual intracellular systems can be assigned to the different CPU cores. With a spatial resolution of Δx Δy Δz 2 μm there are about 5,300 degrees of freedom in the linear system for computing the intracellular potential of each cell. Such a small-size system suits well for a direct solver, see, e.g., Ref. 36, for which the LU factorization is an excellent choice. Moreover, since each intracellular matrix A i that arises from the finite difference discretization remains unchanged throughout an entire EMI simulation, the factorization step is executed only once. The resulting lower and upper triangular matrices, L i and U i , are stored in memory, so that they are used in a forward-backward substitution procedure every time a linear system of the form A i x b associated with an intracellular subdomain needs to be solved.
It is possible to adopt several threads to parallelize each forward-backward substitution step, but the amount of parallelism at this level is limited. Such a parallelization approach is thus not recommended unless the number of available CPU cores greatly exceeds the number of cells.
For computing the LU factorization and performing forwardback substitutions, we have chosen the serial version of the SuperLU library [37]. We assign each of the linear systems A i x b to an OpenMP thread so that multiple threads can work in parallel solving different linear systems. SuperLU was configured to use the included BLAS implementation, CBLAS, for the linear algebra routines. It is important that the BLAS implementation used by SuperLU is "thread safe" (i.e., that it supports being called from multiple threads) and does not attempt any parallelization of its own, as this interferes with the cell-level parallelization strategy employed here.
Apart from the aforementioned advantage that the LU factorization of each A i is only computed once, there is another opportunity for performance enhancement. This is true as long as we are interested in studying excitable cells of the same shape and size. Under such an assumption, the number of unique intracellular matrices is limited. More specifically, if the excitable cells form a rectangular 2D grid, then for each of the four sides, a cell can either be connected to a neighboring cell via a gap junction or not, leading to some slight changes in the cell's geometry and boundary condition. As such, there are at most 2 4 16 unique matrices, and in the examples in this paper where the cells form an M × N rectangular grid with M, N ≥ 3, there are only 9 different matrices (one for each of the four corners, one for each of the four sides, and one for the inner cells). To save memory usage, we can thus group together the linear systems sharing the same matrix A i , and instead solve A i X B, where X and B are matrices that contain, respectively, all the relevant solution vectors and right-hand sides. Furthermore, we only need to compute and store the LU factorization for each of the 9 unique matrices. In order to ensure good data locality, the CPU cores that are assigned to the same intracellular matrix A i should preferably have direct access to the memory domain where L i and U i reside.
One remark is in order here. Although we discuss in this paper a two dimensional collection of cells, it is important to keep in mind that all the cell geometries are in 3D and all cells are surrounded by a 3D extracellular space. In fact, the EMI model for a mesh of cells can only be applied in 3D because in 2D the extracellular space would become disconnected.

The Extracellular Domain
The splitting algorithm requires the solution of a 3D Laplace equation for the entire extracellular domain with Neumann boundary conditions on the cell membranes. With a careful treatment of the boundary conditions, the finite difference discretization produces a symmetric and positive-definite matrix for the linear system associated with the extracellular domain. These properties of the matrix permit the use of a conjugate gradient (CG) iterative solver with an algebraic multigrid (AMG) preconditioner to solve the linear system. Under optimal conditions, the CG+AMG solver should require a constant number of iterations, irrespective of the number of degrees of freedom in the linear system, resulting in a solution time that grows linearly with the size of the linear system [38].
We used the CG and AMG implementations provided by ViennaCL [39] in our simulations, as ViennaCL supports shared-memory parallelization via OpenMP. The termination criterion |Ax − b|/|b| < 10 − 5 was used. The coarsening method inside AMG was set to maximum independent set (MIS) [40], and the interpolation method was set to smoothed aggregation. Additionally, the Jacobi

The Membrane
On each mesh node along the membrane, the ODE system given in Step 1 of Algorithm 2 is solved with a forward Euler scheme using a substepping time step Δt * 1 μs. We also implemented a first-order generalized Rush-Larsen (GRL1) scheme, and found this theoretically more stable scheme to take about twice the computing time of forward Euler. Since the forward Euler scheme produced stable solutions for all the numerical experiments reported in this paper, the GRL1 scheme was not used. Within each time step, the ODE model can be solved independently on each membrane node, hence there is ample parallelism. We used the Gotran code generator [41] to generate C code for solving a single step for both the forward Euler and GRL1 schemes. The auto-generated code was then augmented with OpenMP directives enabling shared-memory parallel execution. When initializing the memory for the ODE-model's state variables and parameters, we perform a so-called "first touch" to ensure that all CPU cores work on the parts of the memory that are located closest to the cores in terms of connectivity. This step is important for achieving proper scaling on modern servers where the system memory is typically split into multiple NUMA (non-uniform memory access) domains.

Hardware Platform Setup
Two hardware testbeds were used to measure the performance of our parallel solver that implements the splitting algorithm. The specifications are listed in Table 5. The first testbed (System A) is a dual-socket server housing two AMD EPYC 7601 32-core CPUs, each configured with 8-channel memory operating at 2666 MT/s. This server has 2 TiB of memory in total. The second testbed (System B) is a quad-socket server with four Intel Xeon Platinum 8,168 24-core CPUs, each configured with 6-channel memory operating at 2,666 MT/s. The total memory size is 384 GiB. Notably, System A has four NUMA domains per socket, 2 whereas System B only has one NUMA domain per socket. Both systems support 2-way simultaneous multi-threading (SMT), presenting two logical processors to the operating system for each physical CPU core. Our hardware test systems are arguably in the high end of the present-day CPU server market and thus allow us to explore the limits of what can be achieved in a shared-memory environment. Simulations with an even larger number of cells, or a finer resolution inside each cell, will eventually require an enhanced parallel simulator capable of utilizing multiple nodes on a distributed-memory system. Speedup due to increasing the number of nodes used can be expected as long as the communication overhead does not exceed the computation time per node.
Our solver is mainly written in the Python programming language, but the most performance-critical parts are executed in C code which we call from Python via the ctypes module. The resulting multi-threaded parallel solver permits rapid prototyping while also being reasonably performant. In the results presented below, we have set environment variables to impose a "thread binding" that fixes the mapping of OpenMP threads to the logical processors such that the same core will work on the same part of the memory for every time step. All logical processors were assigned one thread each, resulting in 128 threads for System A and 192 threads for System B.

Performance Results
The mesh resolution was set to Δx Δy Δz 2 μm, and we use N it M it 2 in the splitting algorithm. N E is the number of unknowns in the linear system for the extracellular domain. It CG is the mean number of preconditioned CG iterations required for the extracellular system. The time spent solving each of the three domains (Extracellular, Membrane and Intracellular) is listed as well as the total across all domains. On the right side, the times are divided by the number of cells to demonstrate the asymptotic scaling.
Tables 6 and 7 show the time usage per time step needed for 16 cells up to 65,536 and 16,348 cells for Systems A and B, respectively (The memory size on System B is not large enough to simulate 256 × 256 cells.) More specifically, the tables report the average time usage per time step for time steps 6-10 in the simulation. As the number of cells grows, we can see the solution time per cell converging toward a constant. The number of AMGpreconditioned CG iterations for the extracellular system grows by about 71% while the number of unknowns increases by more The mesh resolution was set to Δx Δy Δz 2 μm, and we use N it M it 2 in the splitting algorithm. N E is the number of unknowns in the linear system for the extracellular domain. It CG is the mean number of preconditioned CG iterations required for the extracellular system. The time spent solving each of the three domains (Extracellular, Membrane and Intracellular) is listed as well as the total across all domains. On the right side, the times are divided by the number of cells to demonstrate the asymptotic scaling.
than a factor of 2,800. System B outperforms System A for all the cases, which can be attributed to it having 50% higher theoretical memory bandwidth, 50% more cores, and higher clock speed than System A. Additionally, architectural differences between the CPUs could also have played a role.
In Table 8, we consider the case of a 4 μm resolution for System A, and we are in this case able to perform simulations of up to 512 × 256 cells. Note that in order to use a 4 μm resolution, the tissue geometry has been slightly altered. The distance from the cells to the extracellular boundary in the z-direction was extended to 8 μm, Ω O was of size 104 μm × 16 μm × 16 μm, and Ω W , Ω E , Ω S , and Ω N had size 8 μm × 8 μm × 8 μm (see Figure 3).
The mesh resolution was set to Δx Δy Δz 4 μm, and we use N it M it 2 in the splitting algorithm. N E is the number of unknowns in the linear system for the extracellular domain. It CG is the mean number of preconditioned CG iterations required for the extracellular system. The time spent solving each of the three domains (Extracellular, Membrane and Intracellular) is listed as well as the total across all domains. On the right side, the times are divided by the number of cells to demonstrate the asymptotic scaling.

DISCUSSION
Accurate modeling of excitable cells is important in order to understand how collections of such cells work. The EMI model provides a promising framework for such simulations, but it is challenging from a computational point of view. In this paper, we have developed an optimal splitting-based method that scales perfectly with the number of cells in the simulations. The main result of the paper is that we have been able to formulate a solution strategy that relies on intermediate steps that all are covered by standard theory in scientific computing.
We have seen in Table 8 that a collection of 512 × 256 131,072 cells, with a mesh resolution of Δx 4 μm, leads to linear systems with about 2.5×10 8 unknowns. In comparison, we found that the sinus node of the mouse heart would require about 3.5×10 6 unknowns. Since the computing time scales linearly with the number of nodes, we estimate based on Table 8 that the sinus node of the mouse would require about 3 s of computing time per time step. A complete simulation of 300 ms would thus need about 24 hours of computing time (with Δt 0.01 ms). This is a rough estimate of course, but it still illustrates that accurate simulation of physiologically interesting nodes is within reach.
The volume of the mouse ventricles is about 100 mm 3 (see Ref. 42) or about 440 times larger than the mouse sinus node. In order to be able to simulate mouse ventricles within a 24-hour wallclock time frame, we therefore would need a computer that is about 440 times faster than System A above, assuming that linear scaling would hold for even larger systems. This is actually well within today's technological capability, because the world's most powerful supercomputer at the time of writing, named Fugaku (see, e.g., Refs. 43 and 44), consists of 158,976 computers, each more powerful than System A (if we consider the available memory bandwidth). Although theoretically feasible, it remains to be seen whether a full action potential of a complete mouse heart can be simulated based on the EMI model.
The computing time may be reduced by applying adaptive time stepping and the number of nodes may be reduced by using an adaptive finite element method instead of the finite difference method. However, the finite difference scheme is exceptionally well suited for the computer systems A and B and fewer mesh points in the finite element method could therefore result in increased computing time per computational node (see Ref. 8). A possible path of future work can be to develop a distributed-memory parallel implementation of the splitting algorithm, so that the number of cells to be simulated is not limited by the computing speed and/or memory capacity of a single shared-memory platform.
A weakness of the current implementation is that, whereas the finite difference method is well suited for representing very simple geometries, it is less suited for complex geometrical shapes. For instance, representation of real geometries of the heart, and also including the fiber orientation of the heart muscle is necessary in many applications; see, e.g., Refs. 1, 45, and 46. In order to analyze problems with complex geometries using the EMI model, we will need to use the finite element method. Also, representing realistic geometries of individual cells is very challenging using the finite difference method, but will be easier using the finite element method. A finite element method based on the operator splitting scheme derived in the present report is currently under development. This code will clearly offer greater flexibility with respect to geometries, but the linear systems are less structured and therefore harder to solve in an optimal manner.

CONCLUSION
We have introduced a splitting scheme for the linear part of the EMI equations that allows us to use well developed theory for solving linear systems arising from the discretization of linear, elliptic partial differential equations. The resulting method is stable and is optimal in the sense that the CPU efforts scale linearly with the number of computational nodes. The method therefore enables simulations using far larger computing facilities than has been available in the present work.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding author.