- 1School of Mathematical Sciences, Queensland University of Technology, Brisbane, QLD, Australia
- 2Visiting Professor at Department of Computer Science, University of Oxford, Oxford, United Kingdom
- 3Modelling and Simulation, Institute for Visual and Analytic Computing, Faculty of Informatics and Electrical Engineering, University of Rostock, Rostock, Germany
In this paper, we adapt a two-species agent-based cancer model that describes the interaction between cancer cells and healthy cells on a uniform grid to include the interaction with a third species—namely immune cells. We run six different scenarios to explore the competition between cancer and immune cells and the initial concentration of the immune cells on cancer dynamics. We then use coupled equation learning to construct a population-based reaction model for each scenario. We show how they can be unified into a single surrogate population-based reaction model, whose underlying three coupled ordinary differential equations are much easier to analyse than the original agent-based model. As an example, by finding the single steady state of the cancer concentration, we are able to find a linear relationship between this concentration and the initial concentration of the immune cells. This then enables us to estimate suitable values for the competition and initial concentration to reduce the cancer substantially without performing additional complex and expensive simulations from an agent-based stochastic model.
1 Introduction
In Weerasinghe et al. [1], we developed an agent-based model to explore interactions between cancerous cells and healthy cells in the presence of proteins in the extracellular matrix (ECM). This setting is sometimes called the tumor microenvironment (TME). The model represents a two-dimensional tissue section comprising healthy cells, cancer cells and ECM proteins. The domain consists of an inner and outer region on a uniform 100 × 100 grid. The outer region represents the invasive margin of the tissue. The boundary of the domain represents the surrounding ECM membrane and is fully fenced with no gaps at time zero. We assume that healthy cells are located only in the inner region, and their movement is negligible as adhesion molecules keep these healthy cells in their intended locations [2]. The ECM proteins are randomly placed in the domain and do not move. Initially, the domain represents healthy tissue, and then a single healthy cell turns into a cancerous cell that spreads throughout the tissue due to a series of mutations. Cancer cells cannot occupy a grid position filled by an ECM protein.
The model records the healthy cell and cancer cell densities as a function of time, along with the number of cancer cells that exit the domain due to the breaking down of the surrounding fence. The agent-based model simulates these cell-cell and cell-ECM interactions according to a set of interaction rules underpinned by the eight hallmarks of cancer [3, 4], which allows a link between tumor heterogeneity and cellular response. The model allows a much richer set of interactions than the standard approach via unimolecular and bimolecular reactions that underpin the Law of Mass Action ordinary differential equation. These include, for example, a cell stickiness value, a jump radius, the maximum number of healthy cell divisions, a cell and division age, and a competition rate between healthy and cancer cells, to name a few—see Weerasinghe et al. [1] for more details. An example of cancer progression within the agent-based model was given in Weerasinghe [9], and we present it here—in which healthy cells, cancer cells and ECM proteins (obstacles) are represented as blue, red and black nodes (Figure 1).

Figure 1. Cancer progression in the agent-based model: blue and red nodes represent healthy and cancer cells, respectively. Black nodes represent ECM proteins (obstacles) in the domain. Starting with one cancer cell, the cancer cell population increases, and the healthy cell population decreases with time.
Outputs from the model show that cell-cell and cell-ECM interactions affect cancer cell dynamics and that low initial healthy cell and ECM protein densities promote cancer progression, cell motility and invasion, while high ECM breakdown probabilities of cancer cells on the ECM can also promote cancer invasion.
Of course the processes underlying, for example, multiple myeloma in the bone marrow take place in an unstructured three-dimensional setting with very complex internal structures. So one of our assumptions is that our two-dimensional model taking place on a regular grid with a very simple internal structure based on inner and outer domains is still appropriate for capturing this complexity. This is of course a limitation but at the same time could be considered a benefit when using equation learning. Additionally our agent-based model is quite simple when describing the interactions between cancer cells and immune cells and the movement of cancer cells. Within the bone marrow, there is a complex set of interactions between a variety of proteins that we do not attempt to capture. This again is a limitation but it is also a strength of our approach as there is little information about the protein interaction rates and introducing complexity without such information has negative repercussions when using equation learning.
Even though reaction models following the Law of Mass Action cannot describe the full range of behaviors possible in stochastic ABMs, they form a useful abstraction that is much easier to analyse than an ABM. In this paper, we pursue two objectives. First, we extend the two-species ABM into a three-species model that incorporates treatment strategies involving immune cells (T-cells) (Section 2.1). This is designed to help with the analysis of recently developed clinical protocols that fit into the class of immunotherapy based on the injection of immune cells. We then apply equation learning techniques [5–8] to automatically construct a surrogate reaction system from six different parametrisations of the three-species ABM (Sections 2.2–2.4). These learning techniques determine ODE models from time-series measurements by sparse symbolic regression. In contrast to the stochastic ABM formulation and other kinds of (neural) surrogate models, these are interpretable by humans and can be analyzed mathematically. Finally, we show how to use this simplified representation of the ABM to determine the steady-state behavior of the model without performing costly evaluations, and to make predictions of the efficacy of treatment strategies (Section 3).
2 Materials and methods
2.1 An augmented agent-based model with immune cells
In this paper, we will adapt the two-species agent-based model in Weerasinghe et al. [1] to a three-species model in which the third species represents T-cells (immune cells) that can destroy cancer cells. The key elements of the agent-based cancer model, with regard to a two-species interaction between cancer cells and healthy cells, were described in Weerasinghe et al. [1]. In that paper, careful consideration was given to issues such as the Tumor Microenvironment and the extracellular matrix (ECM). A two-dimensional regular grid was populated with healthy cells, proteins, an initial cancer cell, as well as fences bordering the ECM. The model forms the basis of this three-species model in which we can also consider the interactions with immune cells. We note that healthy cells, immune cells and the fixed number of ECM proteins are static, with only cancer cells moving. Every cancer cell and healthy cell has a stickiness value, and the jump radius of a cancer cell is the number of positions a cell can move at a time. Healthy cells (that have an associated age) have a probability of dividing when mature enough and can divide at most a fixed number of times. A key parameter is the fixed competition value between immune cells and cancer cells, so that when a cancer cell reaches a position that an immune cell occupies, both compete and only the more powerful cell remains, based on the generation of a random number that is compared with the competition value. When cancer cells divide, a daughter cell is placed as close to the mother cell as possible. When cancer cells reach the boundary of the domain, there is a probability of degrading the surrounding fence structure that can enable cancer cells to exit the domain, allowing for the possibility of metastasis. In Chapter 4 of Weerasinghe [9], Weerasinghe considers an initial exploration of the dynamics in which intervention by immune cells represents an immunotherapy treatment. Several authors have developed models, to study the effects of immunotherapy treatments; to those belong ODE-based models [10–12], as well as agent-based models [13–15].
In the agent-based model in Chapter 4, immune cells are placed in the outer region of the domain. The immune cells are immobile. Each immune cell has a competition rate. When a cancer cell or group of cancer cells (due to the stickiness of the cells) reaches a grid position occupied by an immune cell, they compete with one another, and the weaker cell dies. The competition value on [0, 1] of a cancer cell CT is fixed, while the competitiveness rate of an immune cell, CI, can be fixed or be uniformly drawn from the interval [a, 1]. The cancer cell will die if CT < CI. In Weerasinghe [9], values of a = 0, 0.5 and 0.75 are considered. We note that no equation learning of an underlying ordinary differential equation is given in Weerasinghe [9], rather multiple simulations of the agent-based model are performed.
Three different strategies are considered for introducing immune cells into the agent-based model [9]: a one-off strategy, a repeated injection at fixed times, or injection when some percentage (such as 25%) of immune cells are lost. These are denoted as strategies 1, 2, and 3. Immune cells are initially injected when a certain number of cancer cells are detected (such as 500). Figure 2 (Figure 4.3 in Weerasinghe [9]) shows an initial progression of the cancer displaying healthy, cancer, immune cells, and ECM proteins (blue, red, purple, and black). Figure 4.5 in Weerasinghe [9] compares the dynamics for the three strategies when a = 0. Introducing immune cells keeps the healthy cell density at a higher value than without immune cells, and strategy two seems the most effective. Figure 4.14 in Weerasinghe [9] compares the dynamics with a = 0, 0.5, 0.7, and a higher value of a seems to improve the effectiveness of strategies one and three. The immune cell model uses the same parameter values as in the two-species model (so that the initial healthy cell density is 0.324, the ECM protein density is 0.1, and the ECM breakdown probability is 0.5). The additional aspects involve the initial immune cell density and the nature of the competition between the immune cells and the cancer cells.

Figure 2. Cell distribution of the TME when immune cells are introduced to the environment. Blue nodes, red nodes, purple nodes and black nodes represent healthy cells, cancer cells, immune cells, and ECM proteins, respectively.
We will simulate the immune cell agent-based model with just a single injection of the immune cells and a fixed competition value between immune cells and cancer cells (cf. Section 2.4). The reason for adopting the single injection protocol is that this is clinically simpler to implement than other protocols and indeed, in the recently developed CAR-T immunotherapy protocol, a single injection of re-engineered immune cells is performed [16]. This will also simplify the use of equation learning to build a three-dimensional system of ordinary differential equations based on a library of interactions between the three species for a number of scenarios based on different initial conditions for the immune cells and a fixed competition value between immune cells and cancer cells.
2.2 Building a system of ODEs using Equation Learning
In Burrage et al. [6], we adapted the approach of Brunton and Kutz [5, 17] based on their sparse identification of nonlinear dynamics (SINDy) to discover an underlying ODE of our two species agent-based stochastic model [9] describing the interaction of healthy cells (H) and cancer cells (C). An issue with the standard SINDy approach is that the ODE that is constructed is done component by component, and so the ensuing system of ODEs has system components decoupled from one another. However, in Burrage et al. [6], the equation learning is based on a library of chemical reactions describing the evolution of the population counts of species Si (here, H and C). Reactions take the form with k being the rate constant, ri, li∈ℕ the stoichiometric constants, and ν, νi = ri−li the stoichiometric coefficient vector. Under the Law of Mass Action, the rate constants are multiplied by the concentrations of the reactants. With this assumption, a library of functions can be constructed with the stoichiometric vectors coupling the components. The ODE that is built in this way can be interpreted as a set of reactions and respects the interactions inherent in the agent-based model. We built a library of chemical reactions based on C and H with 8 unimolecular reactions and 9 bimolecular reactions. The library of 17 possible reactions is
with associated rate constants k1, ⋯ , k17. For example, the vector number 3 corresponds to the reaction and vector 12 to
The approach to find the non-negative rate constants is described extensively in Burrage et al. [6], but essentially we build data vectors and (computed as means over 500 simulations on a fixed time mesh). We then compute approximations to the derivatives and and solve a least squares problem
where is the library of chemical reactions evaluated at the data vectors and is the vector of non-negative rate constants that is to be found. This can be solved by a non-negative least squares algorithm such as lsqnonneg in MATLAB. Once these rate constants have been found, we can use the Law of Mass Action to write out the ODE system. Now, the Law of Mass Action with m reactions can be described by an ODE
with stoichiometric vectors νj and propensity functions aj(y). There is a well-established theory for discrete stochastic chemical kinetics—as the number of particles becomes large, we arrive at the ordinary differential equation regime described by the Law of Mass Action [18]. This regime arises in simple cases such as unimolecular and bimolecular reactions as well as for more complicated nonlinear reactions based around enzyme-like reactions. However, equation learning is much more difficult in this case and any analysis, such as equilibrium analysis, becomes much too complicated so we avoid this latter strategy.
In the case of the above library, the corresponding ODE after applying Equation 1 is
We note from the library that there is a natural coupling between the two components from reactions 3, 4, 12, 14, and 15.
Within the agent-based model [6], the authors took a time step of 2/135 over a time interval of that implies 1,800 steps. They also chose and . Better matches to the data seemed to be obtained when just sampling from 180 steps. Whether we used all 1,800 data points or sampled every 10 steps, the non-negative least squares algorithm determined that k1, k2, k3, k4, k6, k8, k11, k13 and k16 were all zero, leaving only 8 reactions in the library [6].
It should be noted that this choice of reactions to be placed in the library can lead to issues of linear dependence between certain reactions—for example, C+C → 0 and C+C→C. In this case, lsqnonneg flags a highly ill-conditioned system but still converges to a solution in which C+C → 0 appears in the final set, but C+C→C is removed.
2.3 Equation learning formulation for the augmented ABM
If we now consider the three-dimensional agent-based model with cancer cells, C, healthy cells, H, and immune cells, I, and let all the unimolecular and bimolecular reactions occur (with only two reactants on the left-hand side) then there are 27 unimolecular reactions and 33 bimolecular reactions—so a total of 60 reactions. Different from Burrage et al. [6], we avoid some collinearities without loss of generality by only considering reactions with a stoichiometric change of at most one per species, that is, νi ≤ 1 for i ∈ {C, H, I}. We can write these reactions as below.
Unimolecular (27):
Bimolecular (33):
Further, from the definition of the ABM we can infer that immune cells only interact with cancer cells. Incorporating this prior knowledge, we may work with a reduced library of 49 reactions, where reactions of the form I+I → * and H+I → * are removed (with the * signifying “any” combination of products).
For the three-species ABM, the least squares problem is
where K is a vector of length 60 (49).
We let the approach learn which of the 60 (49) reactions appear in terms of two key parameters—namely Ĩ(0) and the fixed competition parameter between cancer cells and immune cells. Note that Ĩ(0) is considered as a parameter of the model rather than an initial condition, as it is varied as part of the treatment strategies and may even be time-dependent.
2.4 Learning equations for the augmented agent-based model
Solving Equation 2 requires two main ingredients: time-series data on the amount of C, H, and I, as well as the derivatives , , and Ĩ′. In our case, the former is obtained by simulating the three-species ABM for 20 time units and taking the mean concentrations over 100 stochastic replications (cf. Figure 3). The measurement step is set to 0.01 time units, resulting in (mean) time-series data for C, H, and I with 2, 000 measurements each. The derivatives can be determined by numerical differentiation with central differences. However, the solution to Equation 2 is very sensitive to the accuracy of these derivatives, as they form the target vector. We also compare this to a numerical differentiation scheme employing Kalman-filtered derivatives.1

Figure 3. Timeseries data resulting from simulation of different combinations of initial immune cell percentage Ĩ(0) and competition rate CI. Each plot shows the mean over 100 stochastic replications, measured every 0.01 time units, resulting in 2, 000 data points until time 20. This data is used as a basis for learning the reaction model.
Further, as was demonstrated in Burrage et al. [6], the number of measurements in , , and Ĩ can considerably influence the solution of Equation 2. Thus, we also vary the sampling interval, using only every 10th, 20th, 40th, or 50th data point.
Finally, there are typically a lot of different solutions to the inverse problem of determining a model (structure and parameters) from data. Thus, without additional information, it is generally impossible to determine a single “correct” result. For example, as also found in Burrage et al. [6] and discussed in Section 2.2, collinearities pose a problem: The reactions and have the same effect (considering ODE semantics), so choosing among them is impossible without additional knowledge. This also impacts the ability of linear solvers to find an adequate solution, as the equation system (Equation 2) becomes ill-conditioned. Thus, it is helpful to incorporate prior knowledge. To avoid collinearities of reactions as shown above, then for the three species ABM we only consider reactions with stoichiometry one (requiring νi ≤ 1 for the change of each species i), leading to 60 reactions (cf. Section 2.3). We call this the complete library for our problem formulation. Additionally, we can exclude interactions of immune cells with healthy cells and among immune cells themselves, which leads to a constrained library of 49 reactions. The necessity of incorporating constraints and prior knowledge is also highlighted in Santosa and Weitz [8] where the authors demonstrate identifiability issues in equation learning (even before SINDy was introduced) associated with sparsity-enforcing approaches when minimizing ℓ1 or ℓ0 norms.
2.4.1 A single reaction model reflecting ABM parameters
We solve Equation 2 using the nnls optimizer from the Python package scipy [19] for the data obtained from the different combinations for the initial percentage of immune cells (Ĩ(0) ∈ {0.05, 0.2, 0.1}) and the competition value between cancer and immune cells (CI ∈ {0.5, 0.75}). The result of the equation learning is then a set of six models, one per parameter combination in dom(Ĩ(0)) × dom(CI). We hypothesize that these six models share a large part of their reactions, as they are based on data from the same ABM. A final model is then determined by taking the union over the sets of reactions included in all six scenarios. If our hypothesis holds, this union model should contain much less than the 60 (complete library) or 49 (constrained library) reactions in the library. Taking the union instead of the intersection acknowledges the fact that, in some scenarios, the rate constant of a specific reaction might be (very close to) zero, while it is important in many others. Further, there should be a relation between the two varied parameters of the ABM and the rate constants and reactions included in the union model. Exploiting this relation should enable approximating a wide variety of behaviors of the ABM and drawing conclusions on the ABM's properties from analyzing the union model.
A union model's fidelity to the ABM can be tested a priori, without knowledge of this relation, by trying to fit it to the six scenarios again. For this, we again use Equation 2, but limit the library to the reactions contained in the union model. We integrate the calibrated models using LSODA and regard the mean over the root mean squared errors (RMSEs) with respect to the six time series as the union model's accuracy. This measure allows us to compare different values for the hyperparameters involved in equation learning.
2.4.2 Hyperparameter scan
As we outlined above, learning reaction models involves some choices that can strongly influence the resulting model. These include the selection of reactions in the library (complete/constrained), the method used for numerical differentiation of the time series data (central differences on filtered/unfiltered data), and the measurement interval (considering only every n-th point from the data with n ∈ {10, 20, 40, 50}). We determine a union model as outlined above for every one of the above 16 combinations of hyperparameters, selecting the model with the smallest RMSE as the final result. The complete results of this analysis are visualized in Figures 4, 5. The former shows the coefficient values obtained for each reaction in our complete library when varying the sampling frequency n of the data (y-axis) and the filtering for differentiation (x-axis). Each subplot depicts the library of reactions on the x-axis and the ABM parameter configurations on the y-axis. Figure 5 shows the same for the case of the constrained library. Some interesting observations can be made regarding the choice of hyperparameters. When using filtered gradients (right columns in the figures), the magnitude of some coefficients decreases with increasing value of n, such as for the reaction 2H→H in Figure 4. On the other hand, when using unfiltered gradients, this effect cannot be observed. Looking at the behavior over ABM parameter configurations (vertical streaks), we can observe that, indeed, there is a relation between ABM parameter configurations to the coefficients of many reactions. For example, in Figure 4, bottom left, the coefficient of the reaction 2I→I is always higher for CI = 0.5 than for CI = 0.75. Further, it seems to increase when I(0) increases (this can be seen by considering every second cell from bottom to top). Such patterns can be observed for many of the other reactions in both, the complete and constrained, cases.

Figure 4. Coefficients determined for datasets produced by different combinations of the competition rate c and initial immune cell populations. Left column: central gradients. Right column: filtered gradients using kalmangrad. Top to bottom: using every 10th, 20th, 40th, and 50th data point.

Figure 5. Same as Figure 4, but with a library where immune cells (I) are only allowed to interact with cancer cells (C).
3 Results
3.1 Equations for the three-species agent-based model
Table 1 summarizes the results of the hyperparameter scan. With the constrained library, where immune cells are not allowed to interact with healthy cells or themselves, the smallest mean over RMSEs (≈5.45 × 10−3) was obtained when using unfiltered central difference gradients over every 10th data point, leading to 12 reactions. In the unconstrained case, a slightly lower mean over RMSEs (≈5.06 × 10−3) could be obtained using the same hyperparameters, leading to 14 reactions. We choose to continue with the model based on the constrained library, for being more parsimonious and also recognizing our prior knowledge about possible reactions while achieving a very similar accuracy as the unconstrained model. In Figure 5, this combination is shown on the top left. The reactions of our chosen union model are:
With the Law of Mass Action, we obtain the following differential equation system
A coupling of the derivatives can be observed over the coefficients k2, k5, k6, k7, k9, and k10.
Figure 6 shows the difference between the original data and fitting this best model (adjusting k0, …, k11) to the six scenarios. It can be seen that the 12 learned reactions above can very accurately be fitted to the different ABM parametrizations and reproduce the ABM's results. We can now use the coupled equations from Equation 4 as a surrogate to analyse properties of the ABM, such as the equilibrium states.

Figure 6. Accuracy when fitting the learned model Equation 3 to the different ABM scenarios. It is evident, that the model is able to very accurately approximate the ABM. Further, a relationship between the parameters CI, I(0) and the rate constants can be observed.
3.2 Equilibrium analysis of the equation-learning ODEs
In this subsection, we study the equilibrium of the 6 ODEs constructed from the scenario data. We note from Equation 4 that each ODE takes the form (with the 13 coefficients θ1, …, θ5; ϕ1, …, ϕ4; λ1, …, λ4 determined)
We solve the right-hand side of each equation to 0. Studying each in turn, Equation 7 can be written as
or
Now Equations 5, 9 lead to
and with C≠0 yields
Finally Equation 6 is just
so Equations 9–11s give
or
Write Equation 12 as
Substitute Equation 13 into Equation 11, then
or
Writing (and comparing coefficients with Equation 14)
and substituting into Equation 15 gives a fourth degree polynomial in C:
We will solve Equation 16 for C, Equation 13 for H, Equation 8 for I. It can be shown that
Some analysis (not given here) shows that there is only one positive zero for C given by Equation 16 for each of the six scenarios. The results are given in Table 2.
The use of equilibrium analysis is two-fold. First, it allows us to see the long-term stability of cancer cells within the spatial domain, and secondly, it allows us to infer a specific relationship between the equilibrium value of cancer cells and the immune cell injection. This is vital in giving clinicians a clue as to what sort of initial immune cell injection value should be chosen in order to control the cancer proliferation to an appropriate value.
4 Discussion
4.1 Insights from the agent-based model and the learned equations
We adapted the two-species agent-based model describing the interaction between cancer and healthy cells developed in Weerasinghe et al. [1] to a three-species model that included immune cells. The aim of this was to probe the role of immunotherapy in cancers, such as multiple myeloma, based on a simple but effective stochastic spatial model. This was first developed in Weerasinghe [9]. Our model is different from that in that there is a single injection of immune cells into the spatial model, and there is a constant competition parameter between cancer cells and immune cells.
We use equation learning to construct a three-dimensional ODE model, thus removing space and stochasticity from the study. The main focus is then to probe the role of immune cells in dealing with the proliferation of cancer cells.
We have only simulated six scenarios and studied the six learned ODEs, as described in the previous section, in terms of a key characteristic, namely their steady-state behavior. Figure 7 shows that for the two competition values of 0.5 and 0.75, we get more or less a linear relationship between the steady-state concentration of the cancer cells and the initial concentration of the immune cells Ĩ(0). Thus, for a given value of CI we would expect as we increase Ĩ(0) that we would still maintain this linear relationship. This suggests that we do not need to run more expensive simulations of the agent-based model in order to learn additional features of the steady-state behaviors. Furthermore, comparisons of the concentrations of C, H, and I at T = 20 for each of the six learned ODEs and the steady state values suggest that we do not need to run the agent-based model over very long time intervals to find steady-state values.
Figure 7 suggests that for a small value of the initial immune concentration Ĩ(0), there is little effect of the competition value on the cancer cell concentration—recall that a higher value of CI (the competition value between cancer and immune cells), the more effective are the immune cells in suppressing the concentration of the cancer cells. Figure 7 also suggests that as we increase Ĩ(0), the competition value plays an important role in suppressing the cancer cells. Furthermore, Figure 7 also shows that if we want to reduce the cancer cell concentration at steady-state, then engineering the immune cells so that they have a higher competition value and increasing the initial concentration of immune cells both play important roles.
In order to explore further the linear relationship between C and Ĩ(0) for both competition values CI = 0.5, 0.75 we find the linear least squares solution in both cases. These are
We can then ask, for example, what value of Ĩ(0) gives a value of C = 0.2 at equilibrium. It turns out to be
Of course, we can use this approach to study immunotherapy scenarios based on reducing the cancer to manageable levels over certain periods of time.
4.2 Related work
ABMs have become an established approach in cancer research [20, 21]. Modeling individuals and their interactions flexibly supports incorporating detailed biological mechanisms (also spatially and temporally constrained) at the micro level and studying emergent properties at the macro level. Many multiscale models exist that combine cellular behavior with intra-cellular behavior, such as signaling pathways [20, 21].
In a recent review of mechanistic learning in oncology [22], which contrasts knowledge-based and data-driven modeling, various approaches of their combination are listed, including equation learning. It is noted that “despite remarkable success in physics, symbolic regression applications in oncology are still scarce.” One approach is Brummer et al. [23]. It uses a combination of sparse regression (SINDy) and dynamic mode decomposition (DMC) to estimate a system of ODEs from in vitro CAR T-cell glioma therapy data. The starting point is in-vitro data with high temporal resolution instead of an agent-based simulation model. Also, instead of deriving one model (in our case, a union of learned models) to capture all observed dynamics, in Brummer et al. [23], three different models are derived based on the different experimental settings of the in-vitro experiments. Our work thus adds to the currently very small body of literature on applying equation learning in oncology. We have shown that equation learning may not only be used to analyse data but also to simplify the analysis of existing (agent-based) models.
When looking more broadly at the application field of cell biology, the situation is slightly different. Discovering ODEs for biochemical systems from data is discussed in Daniels and Nemenman [24]. The authors automatically discover parsimonious models by systematically fitting models of increasing complexity. Several approaches [6, 25] are based on applying the seminal Sparse Identification of Nonlinear Dynamics (SINDy) [5] to derive ODEs.
Other approaches are based on the specification of biochemical systems as a set of reactions. One way to achieve this is by learning ODEs constrained by a specific coupling [6, 7, 26], as used here. Other approaches include stochastic heuristic search [27] and the chemical reaction neural network (CRNN) [28]. All methods mentioned above only consider the deterministic semantics of reaction models. When stochastic data is available, it can also be used to infer CRNs that allow for a stochastic interpretation, such as [29, 30].
The goal of these approaches is to learn entire simulation models based on data. Ideally, the data stem from in-vitro or in-vivo experiments. However, simulation can be used to generate synthetic data for machine learning in general (thus closing possible gaps in existing data) [31], and for evaluating equation learning in particular, as it allows us to directly compare the learned model with the ground truth with the simulation model generating the data [26].
We show that the approach [6] can generate an ODE model that reproduces the data of the more complex agent-based tumor model. As a next step, the equation model helps us derive properties of the simulation model and reduce the computational costs in experimenting with this original simulation model. The learned equations can be viewed as a surrogate [32] for the real model. However, unlike surrogate models, which are typically black box models such as artificial neural networks [32], the derived equation models can be interpreted by humans. In addition, the entire tooling for analyzing ODEs is available to assess diverse properties and behaviors of the equations approximating the agent-based simulation model, which we showed exemplarily based on a steady-state analysis.
5 Conclusions
We have seen that our approach of learning an ODE system from just a few agent-based modeling scenarios (here, six) and then using the ODE to study a key characteristic of the long-term cancer dynamics is very powerful. In particular, by establishing a linear relationship between cancer concentrations and the initial concentration of the immune cells, this enables us to make a simple analysis on the values of the competition and the initial immune concentration in order to reduce the cancer concentrations to an acceptable size, with a reasonably high value of CI needed in order for Ĩ(0) to be not too large. From a clinical perspective, there may be restrictions on how this might be managed, so other strategies for immune cell injection could be considered based around optimal control strategies, such as bang-bang control [33].
Cancer cell density is an important metric in understanding the progression of a number of cancer-based diseases. In multiple myeloma, for example, mutated cancerous plasma cells can crowd the bone marrow, and significantly hinder the production of red blood cells, white blood cells and platelets. These cancerous cells can also escape the bone marrow, travel through the blood system and degrade many other body components. The agent-based model we have constructed can be considered as a very simple, but still realistic, proxy of the treatment of cancer crowding through immune cell injection in a multiple myeloma setting. Of course the processes underlying multiple myeloma in the bone marrow take place in an unstructured three-dimensional setting with very complex internal structures and interactions between a variety of proteins. As stated in Section 1, the ABM presented here makes some assumptions, namely that its two-dimensional domain, relatively simple interactions between cancer and immune cells, and movement of cancer cells are still able to appropriately capture this complexity. In particular, we do not intend to capture all protein interactions in detail, as there is little information about the interaction rates and introducing complexity without such information can have negative repercussions for equation learning. Regarding the learning of equations and their subsequent mathematical analysis, these assumptions can also be seen as a benefit, as they promote the learning of concise equations and allow drawing informative, high-level conclusions (such as the relation between I0 and cancer cell density in Section 3.2).
Probably the most important abstraction made by the equation learning approach discussed here, is that no spatial information is considered. This is adequate to derive certain insights, but this generally depends on the research questions to be answered by the surrogate. In the future, equation learning might, possibly also considering space, be applied to more complex ABMs, but there is a tradeoff between capturing the full ABM dynamics and sparsity (analysability) of the reactions. Closely approximating the full dynamics of the ABM will result in ODEs with many terms and, depending on the variables of interest, additional ODEs that complicate the analysis. On the other hand, enforcing ODEs with fewer terms will reduce the fidelity of the surrogate to the ABM. By steering this tradeoff according to specific research questions, learning interpretable surrogate equations has the potential to lead to faster and new answers.
Data availability statement
The code and data to reproduce all equation learning results is available at https://doi.org/10.5281/zenodo.15240282.
Author contributions
KB: Conceptualization, Formal analysis, Investigation, Supervision, Visualization, Writing – original draft, Writing – review & editing. PB: Conceptualization, Formal analysis, Investigation, Supervision, Visualization, Writing – original draft, Writing – review & editing. JK: Data curation, Investigation, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing. AU: Funding acquisition, Supervision, Writing – original draft, Writing – review & editing. HW: Data curation, Methodology, Software, Visualization, Writing – original draft.
Funding
The author(s) declare that financial support was received for the research and/or publication of this article. This work is in part supported by the German Research Foundation (DFG) within the Collaborative Research Center (SFB) 1270/2 ELAINE 299150580 and by the ARC Centre of Excellence for Plant Success in Nature and Agriculture (CE200100015).
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declare that no Gen AI was used in the creation of this manuscript.
Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Footnotes
References
1. Weerasinghe HN, Burrage PM, Nicolau DV, Jr. Burrage K. Agent-based modelling for the tumor microenvironment (TME). Mathem Biosci Eng. (2024) 21:7621–47. doi: 10.3934/mbe.2024335
2. Janiszewska M, Primi MC, Izard T. Cell adhesion in cancer: beyond the migration of single cells. J Biol Chem. (2020) 295:2495–505. doi: 10.1074/jbc.REV119.007759
3. Hanahan D, Weinberg RA. The hallmarks of cancer. Cell. (2000) 100:57–70. doi: 10.1016/S0092-8674(00)81683-9
4. Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell. (2011) 144:646–74. doi: 10.1016/j.cell.2011.02.013
5. Brunton SL, Proctor JL, Kutz JN. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc Nat Acad Sci. (2016) 113:3932–7. doi: 10.1073/pnas.1517384113
6. Burrage PM, Weerasinghe HN, Burrage K. Using a library of chemical reactions to fit systems of ordinary differential equations to agent-based models: a machine learning approach. Numer Algor. (2024) 96:1–15. doi: 10.1007/s11075-023-01737-0
7. Hoffmann M, Fröner C, Noé F. Reactive SINDy: discovering governing reactions from concentration data. J Chem Phys. (2019) 150:025101. doi: 10.1063/1.5066099
8. Santosa F, Weitz B. An inverse problem in reaction kinetics. J Math Chem. (2011) 49:1507–20. doi: 10.1007/s10910-011-9835-2
9. Weerasinghe HN. Mathematical modelling for the Tumour Microenvironment (TME) under Cellular Stress. PhD Thesis, Queensland University of Technology. (2023).
10. Nani F, Freedman HI. A mathematical model of cancer treatment by immunotherapy. Math Biosci. (2000) 163:159–99. doi: 10.1016/S0025-5564(99)00058-9
11. Konstorum A, Vella AT, Adler AJ, Laubenbacher RC. Addressing current challenges in cancer immunotherapy with mathematical and computational modelling. J R Soc Inter. (2017) 14:20170150. doi: 10.1098/rsif.2017.0150
13. Gong C, Milberg O, Wang B, Vicini P, Narwal R, Roskos L, et al. A computational multiscale agent-based model for simulating spatio-temporal tumour immune response to PD1 and PDL1 inhibition. J R Soc Interf . (2017) 14:20170320. doi: 10.1098/rsif.2017.0320
14. Gong C, Ruiz-Martinez A, Kimko H, Popel AS. A spatial quantitative systems pharmacology platform spQSP-IO for simulations of tumor-immune interactions and effects of checkpoint inhibitor immunotherapy. Cancers. (2021) 13:3751. doi: 10.3390/cancers13153751
15. Eftimie R. Multi-scale aspects of immune responses to solid cancers: mathematical and computational modelling perspectives. In: Modelling and Computational Approaches for Multi-Scale Phenomena in Cancer Research: From Cancer Evolution to Cancer Treatment. World Scientific (2025). p. 61–92. doi: 10.1142/9781800614383_0003
16. Peng L, Sferruzza G, Yang L, Zhou L, Chen S. CAR-T and CA_NK as cellular cancer immunotherapy for solid tumours. Cell Mol Immunol. (2024) 21:1089–108. doi: 10.1038/s41423-024-01207-0
17. Brunton SL, Kutz JN. Data-Driven Science and Engineering, Machine Learning, Dynamical Systems, and Control. Cambridge: Cambridge University Press (2022). p. 4. doi: 10.1017/9781009089517
18. Gillespie DT. Exact stochastic simulation of coupled chemical reactions. J Phys Chem. (1977) 81:2340–61. doi: 10.1021/j100540a008
19. SciPy team and contributors (2023). SciPy. Verison 1101. Available online at: https://scipy.org/ (accessed February 19, 2023).
20. Cogno N, Axenie C, Bauer R, Vavourakis V. Agent-based modeling in cancer biomedicine: applications and tools for calibration and validation. Cancer Biol Ther. (2024) 25:2344600. doi: 10.1080/15384047.2024.2344600
21. West J, Robertson-Tessi M, Anderson AR. Agent-based methods facilitate integrative science in cancer. Trends Cell Biol. (2023) 33:300–11. doi: 10.1016/j.tcb.2022.10.006
22. Metzcar J, Jutzeler CR, Macklin P, Köhn-Luque A, Brüningk SC. A review of mechanistic learning in mathematical oncology. Front Immunol. (2024) 15:1363144. doi: 10.3389/fimmu.2024.1363144
23. Brummer AB, Xella A, Woodall R, Adhikarla V, Cho H, Gutova M, et al. Data driven model discovery and interpretation for CAR T-cell killing using sparse identification and latent variables. Front Immunol. (2023) 14:1115536. doi: 10.3389/fimmu.2023.1115536
24. Daniels BC, Nemenman I. Automated adaptive inference of phenomenological dynamical models. Nat Commun. (2015) 8:6. doi: 10.1038/ncomms9133
25. Mangan NM, Brunton SL, Proctor JL, Kutz JN. Inferring biological networks by sparse identification of nonlinear dynamics. IEEE Trans Molec Biol Multi-Scale Commun. (2016) 2:52–63. doi: 10.1109/TMBMC.2016.2633265
26. Kreikemeyer JN, Burrage K, Uhrmacher AM. Discovering biochemical reaction models by evolving libraries. In: International Conference on Computational Methods in Systems Biology. Springer (2024). p. 117–136. doi: 10.1007/978-3-031-71671-3_10
27. Martinelli J, Grignard J, Soliman S, Ballesta A, Fages F. Reactmine: a statistical search algorithm for inferring chemical reactions from time series data. arXiv preprint arXiv:220903185 (2022).
28. Ji W, Deng S. Autonomous discovery of unknown reaction pathways from data by chemical reaction neural network. J Phys Chem A. (2021) 125:1082–92. doi: 10.1021/acs.jpca.0c09316
29. Klimovskaia A, Ganscha S, Claassen M. Sparse regression based structure learning of stochastic reaction networks from single cell snapshot time series. PLoS Comput Biol. (2016) 12:e1005234. doi: 10.1371/journal.pcbi.1005234
30. Kreikemeyer JN, Andelfinger P, Uhrmacher AM. Towards learning stochastic population models by gradient descent. In: Proceedings of the 38th ACM SIGSIM Conference on Principles of Advanced Discrete Simulation. SIGSIM-PADS '24. New York, NY, USA: Association for Computing Machinery (2024). p. 88–92. doi: 10.1145/3615979.3656058
31. von Rueden L, Mayer S, Sifa R, Bauckhage C, Garcke J. Combining machine learning and simulation to a hybrid modelling approach: current and future directions. In: Advances in Intelligent Data Analysis XVIII: 18th International Symposium on Intelligent Data Analysis, IDA 2020, Konstanz, Germany, April 27-29, 2020, Proceedings 18. Springer (2020). p. 548–560. doi: 10.1007/978-3-030-44584-3_43
32. Gherman IM, Abdallah ZS, Pang W, Gorochowski TE, Grierson CS, Marucci L. Bridging the gap between mechanistic biological models and machine learning surrogates. PLoS Comput Biol. (2023) 19:e1010988. doi: 10.1371/journal.pcbi.1010988
Keywords: agent-based modeling, equation learning, SINDy, cancer cell dynamics, immune cell dynamics
Citation: Burrage K, Burrage PM, Kreikemeyer JN, Uhrmacher AM and Weerasinghe HN (2025) Learning surrogate equations for the analysis of an agent-based cancer model. Front. Appl. Math. Stat. 11:1578604. doi: 10.3389/fams.2025.1578604
Received: 18 February 2025; Accepted: 21 April 2025;
Published: 15 May 2025.
Edited by:
Joseph Malinzi, University of Eswatini, EswatiniReviewed by:
Rossitza Marinova, Concordia University of Edmonton, CanadaVictor Ogesa Juma, University of British Columbia, Canada
Said Salim, State University of Zanzibar, Tanzania
Copyright © 2025 Burrage, Burrage, Kreikemeyer, Uhrmacher and Weerasinghe. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Pamela M. Burrage, cGFtZWxhLmJ1cnJhZ2VAcXV0LmVkdS5hdQ==