Machine learning enhanced cell tracking

Quantifying cell biology in space and time requires computational methods to detect cells, measure their properties, and assemble these into meaningful trajectories. In this aspect, machine learning (ML) is having a transformational effect on bioimage analysis, now enabling robust cell detection in multidimensional image data. However, the task of cell tracking, or constructing accurate multi-generational lineages from imaging data, remains an open challenge. Most cell tracking algorithms are largely based on our prior knowledge of cell behaviors, and as such, are difficult to generalize to new and unseen cell types or datasets. Here, we propose that ML provides the framework to learn aspects of cell behavior using cell tracking as the task to be learned. We suggest that advances in representation learning, cell tracking datasets, metrics, and methods for constructing and evaluating tracking solutions can all form part of an end-to-end ML-enhanced pipeline. These developments will lead the way to new computational methods that can be used to understand complex, time-evolving biological systems.


Introduction
Understanding how cells self-organize to become tissues and whole organisms is one of the most fundamental questions of biology. Indeed, single cell biology has the potential to illuminate processes from development and regeneration to diseases such as cancer. A predicate of quantifying cell biology in space and time is a suite of computational tools that can extract measurements from the myriad sources of experimental data. These include algorithms to detect cells, measure properties such as shape, morphology, or biochemical activity and to link these observations over time into biologically meaningful trajectories. Recent advances in optical imaging methods such as light-sheet microscopy now allow researchers to capture volumetric (3D + t) timelapse image data at high-frame rates, with multiple biochemical reporters (Dunsby, 2008;Chen et al., 2014;Kumar et al., 2014;Sapoznik et al., 2020;Yang et al., 2022). As such, we are now in an era where we can generate vast volumes of information-rich experimental imagery more easily than we can extract meaning from the data.
In recent years, machine learning (ML) has had a transformational effect on microscopy data analysis; common image processing tasks such as cell segmentation, image denoising, feature extraction and cell state classification now routinely use a variety of ML-based algorithms (Moen et al., 2019a). ML algorithms can leverage experimental image data to improve robustness and accuracy in the task. However, despite major efforts in developing cell tracking algorithms, extraction of high-fidelity, multi-generational lineages remains a major bottleneck in microscopy image analysis (reviewed in (Wolf et al., 2021)). Most current approaches use the tracking-bydetection paradigm, i.e., that the tracking problem is decomposed into two steps i) detection of cells then ii) linking these over time into trajectories.
In contrast, many tracking-by-detection algorithms have been designed using heuristics based on our prior knowledge of what cells look like, and simple cellular behavior like cell division. As powerful as this approach is, it is often not flexible enough to deal with new or unseen data, and accurate tracking is not the end goal; rather, quantifying the underlying cell biology is. Perhaps a more appealing idea, and the central thesis here, is that advances in ML can be leveraged to learn models of cell behavior by posing cell tracking as the task to be learned.

Training and validation data
Annotated data are the essential requirement for ML algorithm development, either for training supervised models or evaluating real-world performance. For cell tracking, two types of annotations are required; i) the annotation of individual cells, marking their locations in space and time and ii) annotations describing how these are linked over time. However, acquisition of manual annotations is laborious, time-consuming, with various studies reporting weeks, months or even years of dedicated time spent annotating a medium-sized dataset suitable for model training (Wolff et al., 2018;Caicedo et al., 2019;Ulicna et al., 2021;Malin-Mayor et al., 2022). Ground truth annotations for 3D + t datasets are even more limited as their annotation complexity increases; they are often sparsely annotated or provide a "gold standard" instead. This means that newly-developed tracking approaches are, by definition, benchmarked and validated exclusively against the few tracks included in the gold standard selection, and the model performance is not extensively measured on the entire dataset where it may exhibit some improvements over known tools. For example, the choice of the lineages included in the gold standard could be task-specific, including favouring long, narrow yet complete trees over broken, but richer and wider lineages capturing the diversity of cellular behavior.
In order to increase the quantity and quality of annotated data, there are an increasing number of efforts to crowd-source annotations (Sullivan et al., 2018;Moen et al., 2019b). These efforts have also highlighted the need for active label cleaning for improved dataset quality (Bernhardt et al., 2022). However, with these, and other efforts, the number of high quality datasets is increasing with time, and popular repositories include the Cell Tracking Challenge (CTC), with data from several microscopy modalities (Ulman et al., 2017;Martin et al., 2023) 1 and the Multiple Object Tracking (MOT) benchmark data capturing diverse cell types from range of model organisms (Anjum and Gurari, 2020)

The tracking problem
The tracking data can be represented as a directed acyclic graph (DAG), where the set of cell detections are vertices (V, also known as nodes). The graph is directed and acyclic due to the arrow of time, and is well suited to representing cell division events. Without any prior knowledge, edges (E) are constructed between vertices in successive time points, such that every vertex at time t is connected to every vertex at t + 1, and so forth. In this case, the full graph of all possible solutions is G hypothesis = 〈V, E〉. The goal of a tracking algorithm is to identify a subgraph (G solution ⊂ G hypothesis ) that minimises the tracking error and captures the motion and key events, such as mitosis and apoptosis, of every cell in the system. There are two closely related key challenges: (i) detection linking ( §3.1) and (ii) lineage reconstruction ( §3.2).

Vertices and edges: detection linking
The simplest formulation of the tracking-by-detection paradigm uses a greedy assignment strategy. In this case, a cost matrix (C) is constructed for all edges between the vertices at time t and t + 1 (Crocker and Grier, 1996;Jaqaman et al., 2008). Here, C yields a simplified version of G hypothesis ; it only considers a successive pair of time points. The goal is to find the optimal set of edges (G solution ) linking vertices, that minimizes the total cost. This is commonly solved as a Linear Assignment Problem (LAP), using combinatorial optimization algorithms such as the Hungarian algorithm (also known as the Kuhn-Munkres algorithm (Munkres, 1957)) or the variant Jonker-Volgenant (Jonker and Volgenant, 1987) algorithm. The time complexity of these algorithms is typically O (n 3 ) making the naïve assignment a costly operation for large numbers of cells.
A central consideration is how to construct the cost matrix C. The simplest formulation uses the spatial (L 2 , Euclidean) distance between the two vertices. However, this naïve assumption does not capture the heterogeneity of behavior typical in real data, and can produce errors in dense cell populations. More sophisticated cost functions can be formulated, for example, by using the predicted motion of the cell via a Kalman filter (Kalman, 1960;Bise et al., 2011;Bove et al., 2017;Ulicna et al., 2021;Ershov et al., 2022), incorporating local flow (Malin-Mayor et al., 2022) or visual features Bove et al., 2017;Ulicna et al., 2021). However, it seems that there is ample room to incorporate additional features in the construction of C. In general, this approach is known as local tracking, as although the algorithm is global in space, it is not so in time. In contrast, global tracking approaches, consider the full hypothesis graph (all time points) while identifying the optimal set of edges (discussed further in §5.2).

Lineage assembly
In addition to reconstructing cell tracks which follow single-cell trajectories over their lifetime ( Figures 1A-C), it is essential to correctly identify cell divisions and the relationships between related cells to precisely reconstruct cell lineages. The lineage is a hierarchical organization of single-cell tracks over time, recording the cell division history over up to several generations. Lineages are usually visualized in form of lineage trees (Sandler et al., 2015;E Kuchen et al., 2020), i.e., planar graphical representations from which the ancestral (mother, grandmother, etc.) as well as generationally-equal (sister, cousins, etc.) relationships can be read ( Figure 1D).
Compared to the task of reconstructing single-cell tracks, accurate lineage reconstruction is the most error-prone stage of cell tracking in long-movies. The fidelity of lineage assembly depends on the success of two steps: (i) the formation of full (or partial) trajectories, ideally capturing the cell from division to division, and (ii) the organization of those trajectories into parent-to-children assignments. This process critically depends on the fidelity of object detection, and methods to construct a hypothesis graph that can be evaluated to identify branching events such as mitosis.

Measuring performance
Measuring tracking errors is critical; metrics are also essential in the design of an effective ML training loop as part of the objective function.

Common tracking errors
Several common errors are observed in automated tracking pipelines ( Figure 1E). Fundamentally, any tracking-by-detection algorithm is limited by the accuracy of object detection; errors arising from under-and over-segmentation or hallucination, can lead to false negative and false positive cell detections that dramatically impact the construction of G hypothesis and therefore the feasibility of potential tracking solutions. For example, missing edges or mitotic detections are common examples of false negatives. A false negative branch occurs when one of the children cells is  (Ulicna et al., 2021)) (B) Tracking cells in C. elegans early embryo development (image data from Murray et al. (Murray et al., 2006)). (C) Tracking cells in D. melanogaster embryo development (image data from Amat et al. (Amat et al., 2014)). (D) Example BTRACK (Ulicna et al., 2021) lineage output, using default tracking parameters, on the C. elegans dataset from Murray et al. (Murray et al., 2006). The manually annotated ground truth tree is shown for reference. The propagation of a single tracking error, highlighted as a red arrow is shown, demonstrating the complexity of the tracking and lineaging problem. (E) Examples of typical errors in automated cell tracking. Vertices are denoted as circles, and correct edges are shown as bold black arrows, errors as red arrows and dashed arrows indicate the ground truth where errors have occurred.
Frontiers in Bioinformatics frontiersin.org 03 falsely linked to the parent track, while the other child is initialized as a new track without information about its ancestry. As a result, the tree branches are erroneously prolonged, and the leaf (terminal) cells of the downstream lineage tree path would be determined to reach a lower, and incorrect, generational depth.
On the other hand, it is critical to produce tracks which are not prematurely terminated or truncated, as those could become putative parents (false positive branch), by falsely linking to other tracks arising in their close neighbourhood. In this case, the lineage would incorrectly reach a higher generational depth. Establishing a suite of metrics (See §4.2) is essential for identifying these tracking errors and to enable training new ML models.

Metrics
A few metrics have been proposed for assessing the performance of cell tracking algorithms. Ulman et al. (Ulman et al., 2017) propose a definition of "tracking accuracy" that is based on representations of the ground-truth and predicted cell lineages as DAGs, where accuracy is calculated using a matching measure that assesses the divergence between the ground-truth and predicted graphs (Matula et al., 2015). They also use a "complete tracks" metric that is based on the number of the ground-truth tracks that are correctly tracked, i.e., where a predicted track follows the same cell through all frames of the ground-truth track (Kang et al., 2008).
Further, lineage-specific metrics are required for cell tracking evaluation. For example, Bise et al. (Bise et al., 2011) define mitotic "branching correctness" (MBC) as the proportion of ground-truth cell divisions that are predicted by the tracking model, where a prediction is considered successful if it captures the correct motherdaughter relationship between the cells concerned and, moreover, predicts the timing of the division within a certain tolerance. The MBC and leaf retrieval score (LRS, (Ulicna et al., 2021)) are lineage scale metrics. Importantly, in studies where retrieval of the cell relationships is desired, an LRS of 0.75 (i.e. 3 out of 4 leaf cells are tracked correctly from start to end of imaging) is more intuitive to the user to benchmark the tracking performance than an often used MOT accuracy metric of 0.97 vs. 0.99. Many of these cell tracking specific metrics are implemented in open-source packages such as Traccuracy 3 .

Leveraging ML to enhance cell tracking
Naïvely, tracking as few as 10 cells in a movie of 10 frames in length yields a hypothesis graph (G hypothesis ) with a total of 10 9 possible solutions for each cell. With larger datasets, this naïve approach is computationally infeasible. Luckily, there are constraints on the problem, and not all of these hypotheses are physically possible; cells conform to a set of "rules" defining their behavior, such as movement and division. Rather than "hard-code" these rules into an algorithm, we might approach it as a data-driven problem. Here, ML provides a potential framework to learn these and other cell behaviors. Given a dataset and a set of metrics to enable optimization, tracking can be posed as a learnable task. In addition to detection, a putative model needs two components: i) A way to represent cells from the imaging data and ii) A method of constructing the hypothesis graph from which the tracking solution can be identified.

Learned representations
Many neural network architectures, such as CNNs, operate in a hierarchical fashion, such that high-dimensional information is compressed into a lower-dimensional representation. This makes them a natural tool for the extraction of features representative of the input image data. For cells growing in populations, it is often important to quantitatively describe their immediate neighbourhood (Li et al., 2020;Wang et al., 2020;Fischer et al., 2021;Stirling et al., 2021;Buchner and Valada, 2022). Moreover, single-cell images do not have to be analysed on per-image basis. Indeed, the advantage of time-lapse imaging is that temporal models of cell behavior can encode (or learn) the transitions of states over time (Held et al., 2010;Bove et al., 2017;Soelistyo et al., 2022;Gallusser et al., 2023). Incorporating features such as local (or collective) motion, neighbourhood embeddings or cell state classification can be used to generate rich representations (Bove et al., 2017;Driscoll et al., 2019;Andrews et al., 2021;Gradeci et al., 2021;De Vries et al., 2022;Ko et al., 2022;Malin-Mayor et al., 2022;Yamamoto et al., 2022;Viana et al., 2023). Increasingly, selfsupervised methods (such as variational autoencoders) are being used to learn explainable representations directly from the image data ( (Zaritsky et al., 2021;Soelistyo et al., 2022;Wu et al., 2022)). As such, the characterization of the temporal landscape of morphological states can help to distinguish heterogeneous cell populations (Freckmann et al., 2022) and diverse cell fates (Soelistyo et al., 2022), or incorporate rules of the tracking problem (Bove et al., 2017). These advances suggest that rich representations of cells can be learned directly from the image data.

Discrete optimization
Another major advance in recent years has been the use of discrete optimization methods to identify a globally optimal solution graph. One early and very successful approach was to use the Viterbi algorithm (Magnusson et al., 2015), treating cell behavior as a hidden Markov model. Alternatively, optimization can be posed as a Linear Programming (LP) problem, i.e., that it is defined by a set of linear inequality constraints that define possible solutions to the problem (Figure 2). The goal is to maximize the value of a linear objective function given these constraints. For example, an Integer LP (ILP) problem can be defined: Effectively the matrix A and vector ρ encode a set of hypotheses about potential edges and hyperedges (edges connecting ≥ 2 vertices, such as a single cell splitting during mitosis) and the likelihood of that hypothesis being correct respectively. Usually this problem is formulated using heuristics, or pre-defined rules, that enable calculation of the likelihood based on the evidence. Since the domain of x is binary, the optimization effectively determines the best set of hypotheses that account for all of the observations, while maximizing the total reward. One of the earliest examples was demonstrated by Al-Kofahi et al. (Al-Kofahi et al., 2006), where cell specific hypotheses, such as track splitting were introduced, and this general strategy has been successfully extended with other hypotheses (Bise et al., 2011;Ulicna et al., 2021;Malin-Mayor et al., 2022).

Learning an end-to-end cell tracking model
It should be noted that the construction of ρ and A are themselves based on reward functions and heuristics that are user-provided and therefore parameterized based on our prior knowledge. In BTRACK we built a hypothesis engine that constructs A and ρ given heuristics and the data; only relevant hypotheses are proposed to limit the size of the ILP problem. The advantage of this approach is that it is computationally cheap and requires little data (acknowledging the general paucity of data; see §2). The disadvantage is that it leads to exponential complexity (ILP is NP-hard) in cases where the problem is ill-posed. In the case of novel or unseen datasets, if the problem is not solvable in a reasonable amount of time, it likely means that either the assumptions of the model are incorrect or it is poorly parameterized.
As such, a future goal of an ML system is to reduce the search space by proposing these hypotheses; the aim is not to enumerate all hypotheses, but intelligently suggest a subset that could account for the data. This is analogous to a formal language that describes both the acceptable states of a system but also, potentially the rules that generate those states. Several recent studies have shown that self-supervised methods can identify and predict cellular events from image data alone (Soelistyo et al., 2022;Gallusser et al., 2023). A putative ML-enhanced tracking algorithm could leverage these predictions and associated confidences to construct A and ρ respectively. Furthermore, recent advances in the fusion of deep-learning and combinatorial solvers provide a route towards achieving this putative end-to-end tracking pipeline (Vlastelica et al., 2019). One can imagine a generalized cell tracking model that has been trained on existing data, which is then further fine-tuned for new datasets using transfer learning. Not only does this formulation lead to an implicitly interpretable representation (since A is by definition a set of rules), it also enables the exciting prospect of discovering novel cellular dynamics.

FIGURE 2
Tracking, graphs and discrete optimization. (A) A sequence of image volumes showing a single mitotic branching event highlighted with white circles representing the vertices and dashed lines representing the ground truth edges. (B) A simplified directed hypothesis graph of the mitotic event. Each vertex represents a unique cell detection. Edge weights (black and red arrows) and hyperedge weights (blue arrows) are calculated as the posterior probability of linking and branching hypotheses given the evidence, and calculated using BTRACK. (C) There are several hypotheses to account for the appearance of new vertices. In hypotheses 1 & 2, edges link vertex 28 and either 29 or 267 respectively. In hypothesis 3, a hyperedge links vertex 28 to both 29 and 267, representing a mitotic event. (D) A simplified ILP optimization problem using the graphical model, and possible solutions. The A matrix is sparse with non-zero elements colored by hypothesis type (red -edge, blue -hyperedge, grey -terminus). The rows represent individual hypotheses and the columns are the vertex IDs forming part of the hypotheses. The optimal solution (which maximizes ρ ⊤ x s.t. Ax 1) is highlighted with an asterisk.

Frontiers in Bioinformatics frontiersin.org
Ultimately, the goal of our research is not to create the perfect cell tracking algorithm, rather we want to understand cell behavior in complex biological systems. ML provides a potential framework to learn aspects of cell behavior by posing cell tracking as the learnable task.
Although we have thus-far considered tracking-by-detection to be comprised of two separate computational steps, there are efforts in the wider computer vision community to develop end-to-end ML tracking algorithms. In this case, detection and tracking can be posed as a joint learning task, i.e., that the model learns to detect objects and track them simultaneously. This has the advantage of coupling model performance to both steps of the tracking-by-detection paradigm. A recent example of this approach are global tracking transformers (Zhou et al., 2022), that couple region proposal networks  with transformers (Vaswani et al., 2017) to perform semi-global multi-object tracking. These methods however, require very large quantities of training data, and in their current formulation, do not consider important hypotheses such as branching events, making adaption to cell tracking more challenging.
A further requirement in the application of ML for scientific enquiry, is model explainability. Deep neural networks are generally considered "black boxes" whose behavior is difficult to explain in human-understandable terms. Owing to their complexity, these models can typically be explained only with reference to simpler approximation models that mimic the general behavioral features of the more complex model (Guidotti et al., 1802;Rudin et al., 2021). Despite this challenge, the promise of explainable deep learning is considerable, particularly in the domain of scientific inquiry. For example, an explainable end-to-end cell tracking pipeline may allow us to investigate the rules that govern cell movement and behavior. By internalizing an external phenomenon (e.g., cell movement), the model would thereby form a computational representation of that phenomenon that human scientists can investigate. This framework has already been applied with some success to the study of cell fate determination in a cell competition context (Soelistyo et al., 2022).
Finally, the open sharing of trained models, metrics and data, is essential to drive scientific progress. This has been a growing trend in the machine learning community, with the appearance of publicly available repositories such as Scivision 4 , Bioimage. IO (Ouyang et al., 2022) and HuggingFace (Wolf et al., 2020). To maximize ease-of-use, model repositories ensure that each model is accompanied by a standardized description, which includes input and output formats, pre-trained weights and source of training data. Ouyang et al. (Ouyang et al., 2022) note that in order for ML models to be of use to external communities, such as experimental microscopists, models should be accessible via GUI-equipped software, such as Fiji (Johannes et al., 2012) or Napari (Sofroniew et al., 2022). Indeed, effective visualization of tracking data is also an open challenge (examples include Napari, TrackMate (Tinevez et al., 2017;Ershov et al., 2022) and Mastodon 5 ).
The promise of ML to help us understand complex biological systems is considerable. Intelligent systems can extract patterns and insights that evade the notice of human scientists, allowing us to investigate domains previously hindered by our limited ability to distil scientific knowledge from large datasets. As such, advances in ML will ultimately enable the automated discovery of novel cellular dynamics.

Data availability statement
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Author contributions
AL conceived and designed the research. CS and KU performed computational experiments. AL wrote the image processing and cell tracking code. CS, KU, and AL evaluated the results and wrote the paper. All authors contributed to the article and approved the submitted version.

Funding
This work was supported by a BBSRC LIDo AI PhD studentship to CS and KU. AL wishes to acknowledge the Turing Fellowship from the Alan Turing Institute and the support of BBSRC grant BB/ S009329/1. AL also wishes to acknowledge the support of UCL's Advanced Research Computing centre, and grants NPA-0000000014 and NP2-0000000022 from the Chan Zuckerberg Initiative DAF, an advised fund of the Silicon Valley Community Foundation.