^{1}

^{2}

^{†}

^{3}

^{4}

^{†}

^{1}

^{†}

^{5}

^{6}

^{†}

^{3}

^{4}

^{*}

^{†}

^{1}

^{2}

^{3}

^{4}

^{5}

^{6}

Edited by: Heiko Enderling, University of Texas MD Anderson Cancer Center, United States

Reviewed by: Jaya Lakshmi Thangaraj, University of California, San Diego, United States

Nahum Puebla-Osorio, University of Texas MD Anderson Cancer Center, United States

Ibrahim Chamseddine, Harvard Medical School, United States

*Correspondence: Sarah C. Brüningk,

†ORCID: John Metzcar,

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Mechanistic learning refers to the synergistic combination of mechanistic mathematical modeling and data-driven machine or deep learning. This emerging field finds increasing applications in (mathematical) oncology. This review aims to capture the current state of the field and provides a perspective on how mechanistic learning may progress in the oncology domain. We highlight the synergistic potential of mechanistic learning and point out similarities and differences between purely data-driven and mechanistic approaches concerning model complexity, data requirements, outputs generated, and interpretability of the algorithms and their results. Four categories of mechanistic learning (sequential, parallel, extrinsic, intrinsic) of mechanistic learning are presented with specific examples. We discuss a range of techniques including physics-informed neural networks, surrogate model learning, and digital twins. Example applications address complex problems predominantly from the domain of oncology research such as longitudinal tumor response predictions or time-to-event modeling. As the field of mechanistic learning advances, we aim for this review and proposed categorization framework to foster additional collaboration between the data- and knowledge-driven modeling fields. Further collaboration will help address difficult issues in oncology such as limited data availability, requirements of model transparency, and complex input data which are embraced in a mechanistic learning framework

Data and knowledge both drive the progress of research and are the cornerstones of modeling. Depending on the emphasis, both data-driven (exemplified by machine and deep learning) and knowledge-driven (exemplified by mechanistic mathematical modeling) models generate novel results and insights. Mechanistic learning describes approaches that employ both data and knowledge in a complimentary and balanced way.

An increasing understanding of cancer evolution and progression along with growing multi-scale biomedical datasets, ranging from molecular to population level, is driving the research field of mathematical oncology (

Data science may be defined as “

An alternative is to formulate a specific guess on how relevant variables interact between input and output through the formulation of a mathematical model. Bender defines a mathematical model as an “

The evolving field of mechanistic learning (

As per definition, data- and knowledge-driven modelling are complementary perspectives for approaching research questions. Here, we address similarities and differences to understand synergies at the interface of these fluid concepts.

According to Rockne et al. (

It is tempting to suggest that knowledge-driven models are inherently interpretable. Yet, the implementation of chains of relationships can formulate complex inverse problems. Subsequently,

Knowledge-driven modeling has successfully been applied to investigate different aspects of cancer including somatic cancer evolution and treatment. We refer the interested reader to recent review articles (

A common understanding of data-driven modeling (e.g. - machine learning, deep learning and classical statistics) is the creation of insight from empirical examples (

Purely data-driven models do not readily leverage the community’s understanding of the system under study but instead often employ highly parameterized models. The many degrees of freedom allow flexibility to approximate complex and mechanistically unknown relationships, e.g. deep neural networks act as “universal function approximators’’ (

Generally, the application focus differs from that of knowledge-driven models. Generalization beyond the observed data space is often challenging (

In summary, data-driven approaches are powerful tools for knowledge generation. In oncology, data-driven approaches have previously contributed substantially to scientific progress and process automation (

General conceptual differences between knowledge-driven vs. data-driven modeling.

Knowledge-driven modeling | Data-driven modeling |
---|---|

The current “knowledge” drives the implementation of an educated guess regarding the studied relationship. |
The empirical reality is approximated through a (complex) relationship. |

Data serves the purpose of validation of the implemented estimate of reality. |
Empirical observations dictate the extraction of information. |

Generate novel hypotheses for causal mechanisms. |
Isolate relevant inputs from empirical datasets for a given output. |

Deductive capability: extrapolation to predictions about behaviors not present in original data |
Inductive capability: interpolation of data with limited extrapolation horizon |

Predict or describe dynamics of the overall system. |
Infer dynamics from the overall system while governing equations and parameters are not exactly known |

Small but specific data set is needed for validation |
Large number of parameters (thousands, millions or more), requiring data-intensive training/fitting |

Limiting factor(s): Quality of assumptions; parameter sensitivity |
Limiting factor(s): Quality and quantity of data; model structure such as choice of features (inputs) |

Some aspects here are taken from Baker et al. (

Given these similarities and differences, it is important to account for possible challenges upon combining approaches. Model bias or conflicting information generated by addressing the same task with differently motivated approaches needs to be carefully considered. At the same time, there exists ample room to harness synergies between knowledge and data-driven modeling under the umbrella of mechanistic learning. Specifically, differences regarding data requirements, model complexity, extrapolation, and application regimes imply that a combination of both approaches may mitigate individual limitations. For example, parameters of a mechanistic mathematical model can be estimated by a deep learning algorithm from complex multi-omics data or knowledge-driven descriptions can be used to constrain the large range of possible solutions of a complex data-driven approach to a meaningful subset. In the following sections, we provide a detailed overview of how these combinations can be achieved and provide real-world application examples to motivate these.

“Mechanistic learning” (

Sequential - Knowledge-based and data-driven modeling are applied sequentially building on the preceding results

Parallel - Modeling and learning are considered parallel alternatives to complement each other for the same objective

Extrinsic - High-level

Intrinsic - Biomedical knowledge is built into the learning approach, either in the architecture or the training phase

Examples of mechanistic learning structured in four combinations: Parallel combinations (top left) with examples of surrogate models and neural ordinary differential equations (ODEs). Data- and knowledge-driven models act as alternatives to complement each other for the same objective. Sequential combinations (bottom left) apply data- and knowledge-driven models in sequence to ease the calibration and validation steps. Extrinsic combinations (top right) combine knowledge-driven and data-driven modeling at a higher level. For example, mathematical analysis of data-driven models and their results or as complementary tasks for digital twins. Intrinsic combinations (bottom right), like physics- and biology-informed neural networks include the knowledge-driven models into the data-driven approaches. Knowledge is included in the architecture of a data-driven model or as a regularizer to influence the learned weights.

Whereas sequential and parallel combinations make a deliberate choice of aspects of data- and knowledge-driven models to coalesce, extrinsic and intrinsic combinations actively interlace these. Thus, the complexity with respect to implementation and interpretation grows from sequential to intrinsic combinations. While most implementations readily fit into one of these four classes, we emphasize that we do not consider the combinations as discrete encapsulated instances. Instead, we view all synergistic combinations on a continuous landscape between the two extremes of purely knowledge- and data-driven models (

The mechanistic learning landscape shows room for the combination of data-driven and knowledge-driven modeling. We suggest that purely data-driven or purely knowledge-driven models represent the extremes of a data-knowledge surface with ample room for combinations in different degrees of synergism. Further, in the left-bottom corner with almost no data nor knowledge, any modeling or learning technique is limited.

Sequential approaches harness knowledge and data-driven aspects as sequential and computationally independent tasks by disentangling the parameter/feature estimation and forecasting steps. They strive to attain mechanistic learning objectives by interlinking inputs from one approach with another. This could involve utilizing data-driven methods for estimating mechanistic model parameters or implementing feature selection in a data-driven model guided by mechanistic priors. Although sequential frameworks are straightforward to implement and interpret, often their computational demands increase significantly, taking into account both computational requirements and the limitations inherent in the individual approaches (e.g., data requirements, accuracy of prior knowledge).

In medical science, data availability remains a key challenge (

Feature engineering is the process of designing input features from raw data (

Aspects of a mechanistic model can serve as input features to or outputs from machine learning models. This strategy of “mechanistic feature engineering”, was used by Benzekry et al. to predict overall survival in metastatic neuroblastoma patients (

A common problem in knowledge-driven modeling for longitudinal predictions is parameter identifiability and fitting given limited data and complex systems of equations. The bottleneck lies in the lack of a detailed understanding of the mechanistic relation between input data and desired output, rather than a purely computational limitation.

Similar to using mechanistic feature engineering for data-driven model inputs, data-driven approaches can also be employed to discover correlations within unstructured, high-dimensional data to provide inputs to knowledge-driven models. Depending on the specific application a range of methods are possible: imaging data are preprocessed by convolutional architectures, whereas omics data could be processed with network analysis, graph-based, or standard machine learning models. These correlations are then harnessed to predict the parameters of a mechanistic approach. Importantly, each model is implemented and trained/fitted independently, implying a high-level, yet easily interpretable combination. This sequential combination harnesses the ability of data-driven models to extract information in the form of summarizing parameters from high dimensional and heterogeneous data types. Importantly, the type of data required for such analysis needs to meet the criteria of knowledge-driven (e.g., longitudinal information) and data-driven (e.g., sufficient sample size) approaches alike - this may restrict applicability in light of limited data quality or excessive noise. Similarly, limitations such as robustness and prediction performance for the estimated parameters should be considered.

In practice, Perez-Aliacar et al. (

Another sequential construct consists in using machine learning models to predict the residuals of a mechanistic model prediction. Kielland et al. utilized this technique to forecast breast cancer treatment outcomes under combination therapy from gene expression data (

In summary, sequential combinations are attractive due to their clear path toward implementation and interpretation with limitations due to prerequisites on data, mechanistic understanding or uncertainty propagation. While future directions may dive deeper into harnessing more complex input data (e.g. multi-omics, multimodal) for mechanistic model inputs, the technical advancement for sequential combinations remains dictated by the progress in the individual fields.

Parallel combinations blend advantages of purely data- or knowledge-driven models without changing the anticipated evaluation endpoint. These are alternatives for the same task as a purely data- or knowledge-driven approach and hence aspects concerning data requirements, implementation, model robustness, and performance can be compared. This makes them attractive for high-stakes decision scenarios, such as clinical application (e.g. tumor growth prediction).

Many phenomena in oncology can be readily formulated using large systems of equations. However, solving large models comes at a high computational cost. Utilizing methods such as model order reduction aids in optimizing the computational efficiency of the solving process. This approach typically demands substantial mathematical expertise and is not suitable for time- or resource-constrained scenarios such as real-world clinical deployment. Neural networks, as universal function approximators, offer an efficient alternative. In practice, data-driven models are trained on numerical simulation results and approximate a solution to the system of equations. The inference step of the successfully trained model takes a fraction of the computational resources compared to the full mechanistic model (

A related concept is the generation of vast amounts of “synthetic” training data (

For example, Ezhov et al. (

The term “neural ordinary differential equation”, or “neural ODE” originated from the notion of viewing neural networks as discretized ODEs or considering ODEs to be neural networks with an infinite amount of layers (

Neural ODEs have already been used for a variety of tasks in oncology ranging from genome-wide regulatory dynamics (

While oncology research generates vast amounts of data, extracting and consolidating mechanistic understanding from data is a laborious process reliant on human experts. Symbolic regression allows for automated and data-driven discovery of governing laws expressed as algebraic or differential equations. This method finds a symbolic mathematical expression that accurately matches a dataset of label-feature pairs. Two prominent symbolic regression techniques are genetic programming-based optimization (

Despite remarkable success in physics (

However, estimating derivatives from high noise and sparse longitudinal measurements, like many from clinical oncology, remains challenging. Several groups have used variational formulations of ODEs and PDEs in the optimization step without relying on estimating derivatives from noisy and sparse data (

Extrinsic combinations make use of both mechanistic and data-driven approaches to address different aspects of the same problem or to post-process the output of a data-driven implementation.

Originating from analogies in manufacturing and engineering, the concept of digital twins (

Typically, for mechanistic digital twins, a mathematical framework describes the dynamics of tumor size, morphology, composition, and other biomarkers (

Data-driven approaches are trained to optimize a performance metric, but performance alone is not driving a model’s application in (clinical) practice. Here, quantification of the uncertainty of model results, model robustness, as well as interpretability to explain why a model arrived at a certain conclusion are equally important (

Addressing many of the questions related to deep learning is only possible using mathematical methods, i.e., challenges in the field of data-driven models are transformed to mathematical conjectures that are subsequently (dis)proven. This approach ensures that the results generated by models are mathematically reliable and transparent and thus better suited for clinical implementations.

Numerous examples underscore this point and provide motivation for employing intricate architecture designs based on mathematical formulations. A specific instance involves learning a specialized representation that elucidates cancer subtyping from multi-omics inputs, including transcriptomic, proteomic, or metabolomic data (

Data assimilation techniques bridge numerical models and observational data through optimization of starting conditions. Typical examples are Kalman or particle filter methods (

This combination incorporates a mechanistic formulation within a machine learning model either upon training as a contribution to the formulated objective function or

Mechanism-informed neural networks such as physics-informed neural networks (PINNs) (

Equation-regularization has previously been shown to enhance both the performance and interpretability of data-driven architectures. In the context of oncology, one example includes the modeling of tumor growth dynamics (

Rather than optimizing a network architecture through regularization, biology-informed neural networks constrain the model architecture to biological priors from the start. Typically in the context of network analysis, biological priors such as known interactions between genes and/or transcription factors are translated to nodes and edges in a graph (

Finally, in the context of generative approaches, differential equations have previously been incorporated into (deep) neural networks through variational autoencoders. While current examples were obtained from medical applications other than oncology (

Hierarchical nonlinear models, also referred to as nonlinear mixed effects models, are a widely used framework to analyze longitudinal measurements on a number of individuals, when interest focuses on individual-specific characteristics (

Interestingly, hierarchical models have the potential to benefit from more sophisticated data-driven approaches to integrate high-throughput data, such as omics or imaging (

Recently, machine and deep learning have become ubiquitous given their indisputable potential to learn from data (

Here, we identified opportunities for synergistic combinations and provided a snapshot of the current state-of-the-art for how such combinations are facilitated for oncology applications. We highlighted similarities in the mathematical foundation and implementation structure of optimization processes and pointed out differences with respect to data requirements and the role of knowledge and data in these approaches. It is important to structure the growing landscape of models at the interface of data- and knowledge-driven implementations. We hence propose systemizing combinations in four general categories: sequential, parallel, intrinsic, and extrinsic combinations. While sequential and parallel combinations are intuitive and easily implemented, intrinsic and extrinsic combinations incorporate a stronger degree of interlacing that requires a deeper understanding of both data science and mathematical theory. The choice of analysis tool should always keep in mind the quality, size, and type of data and knowledge in light of the underlying research question. An intentional combination of machine learning and mechanistic mathematical modeling can then leverage the strengths of both approaches to tackle complex problems, gain deeper insights, and develop more accurate and robust solutions. Mechanistic learning can take on many facets and is foreseen to grow in importance in the context of mathematical oncology with a particular focus on explainable AI, handling of limited data (e.g. efficient architecture design, data augmentation), and generation of precision oncology solutions. In this review, we discussed only the core concepts. Given the fluid boundaries between data- and knowledge-driven models and in light of the variety of approaches within each of these domains, an exhaustive listing of all combinations is infeasible. However, several future directions stand out. For instance, hybrid modeling with Bayesian statistics, deep generative approaches, or specific training regimes, including semi-supervised (contrastive) or reinforcement learning, are worth mentioning. Finally, despite the positive notion regarding mechanistic learning, certain limitations persist within both separate and combined approaches. Specifically ethical considerations should be addressed. These may arise from data privacy, algorithmic bias, or the clinical implementation of hybrid models.

Finally, with this work we strive to motivate a more active exchange between machine learning and mechanistic mathematical modeling researchers given the many parallels in terms of methodologies and evaluation endpoints, and the powerful results produced by mechanistic learning.

JM: Conceptualization, Formal analysis, Visualization, Writing – original draft, Writing – review & editing. CJ: Conceptualization, Supervision, Writing – review & editing. PM: Conceptualization, Supervision, Writing – review & editing. AK: Conceptualization, Writing – original draft, Writing – review & editing. SB: Conceptualization, Formal analysis, Project administration, Writing – original draft, Writing – review & editing.

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. JM was supported by NSF 1735095 - NRT: Interdisciplinary Training in Complex Networks and Systems. CJ was supported by the Swiss National Science Foundation (Ambizione Grant [PZ00P3_186101]). PM was supported in part by Cancer Moonshot funds from the National Cancer Institute, Leidos Biomedical Research Subcontract 21X126F, and by an Indiana University Luddy Faculty Fellowship. AK-L’s work was funded by the research centers BigInsight (Norges Forskningsråd project number 237718) and Integreat (Norges Forskningsråd project number 332645). SB was supported by the Botnar Research Center for Child Health Postdoctoral Excellence Programme (#PEP-2021-1008). Open access funding by ETH Zurich.

We thank Alexander Zeilmann and Saskia Haupt for many fruitful discussions and helpful contributions without which this manuscript would not have been possible. The collaboration that led to the design of this manuscript was fostered during the 2023 Banff International Research Station (BIRS) Workshop on Computational Modelling of Cancer Biology and Treatments (23w5007) initiated by Prof. M. Craig and Dr. A. Jenner.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.