^{1}

^{2}

^{1}

^{3}

^{1}

^{4}

^{5}

^{*}

^{1}

^{2}

^{3}

^{4}

^{5}

Edited by: Valeriya Naumova, Simula Research Laboratory, Norway

Reviewed by: Andrew G. Edwards, University of California, Davis, United States; Hermenegild Javier Arevalo, Simula Research Laboratory, Norway

This article was submitted to Computational Physics, a section of the journal Frontiers in Physics

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

In this perspective, we examine three key aspects of an end-to-end pipeline for realistic cellular simulations: reconstruction and segmentation of cellular structures; generation of cellular structures; and mesh generation, simulation, and data analysis. We highlight some of the relevant prior work in these distinct but overlapping areas, with a particular emphasis on current use of machine learning technologies, as well as on future opportunities.

Machine learning (ML) approaches, including both traditional and deep learning methods, are revolutionizing biology. Owing to major advances in experimental and computational methodologies, the amount of data available for training is rapidly increasing. The timely convergence of data availability, computational capability, and new algorithms is a boon for biophysical modeling of subcellular and cellular scale processes such as biochemical signal transduction and mechanics [

As biophysical methods have improved, the complexity of our mathematical and computational models is steadily increasing [

The major bottleneck for the widespread use of cell scale simulations with realistic geometries is not the availability of structural data. Indeed, there exist many three-dimensional imaging modalities such as confocal microscopy, multiphoton microscopy, super-resolution fluorescence and electron tomography [

An illustration of the complex pipeline needed to go from imaging data to a segmented mesh, with various opportunities for emerging techniques in machine learning shown throughout the pipeline.

Images generated by the various microscopy modalities must undergo pre-processing to correct for errors such as uneven illumination or background noise [

Electron tomography (ET) remains one of the most popular methods of cell imaging for modeling purposes [

Given a noisy 3D reconstruction, how can we segment cellular structures of interest? One approach is to employ manual segmentation tools applied to 3D tomograms such as XVOXTRACE [

Annual cell segmentation challenges are evidence of the demand for automatic segmentation [

More recently, deep learning-based ML algorithms (

Both the difficulty and cost of generating annotated training data increases exponentially when dealing with Volumetric (3D) images compared with 2D, which are the desired inputs for biophysical simulations. Since the U-Net is a 2D architecture [

Excitingly, such algorithms are being made openly accessible and easy-to-use. For example, iLastik [

Generating well-organized and annotated training data continues to be the major challenge for most ML segmentation methods. Crowdsourced annotation tools like Amazon's Mechanical Turk can be useful in this context, but are still limited by the difficulty of training naive users on tracing specific structural images. Alternatively, many ML algorithms leverage transfer learning approaches using pre-trained networks such as VGG-net [

There are two main aspects involved in the development of comprehensive biophysical models—(1) what is the process being modeled? and (2) what is the geometry in which this process is being modeled? Answers to the first question are based on experimental observations and specific biology. Answering the latter is significantly more challenging because of the difficulties in—(i) obtaining accurate segmentations, (ii) discovering new structure from experiments, and (iii) simultaneously visualizing multiple structures. The use of synthetically generated geometries, which can probe different arrangements of organelles within cells could be relevant for generating biologically relevant hypotheses.

A subset of ML models, called

In recent years, there has been rapid progress in applying deep generative models to natural images, text, and even medical images. Popular classes of deep generative models like Variational Autoencoders [

In cell biology, much of the work in building generative models of cellular structures has been associated with the open source CellOrganizer [

The challenge going forward will be how best to use generative modeling given the data in hand. This will depend on the question we want to ask of the data. For example, if we are modeling processes associated with cell and nuclear shape, spherical harmonics based generative models might be more appropriate than deep learning based methods [

ML is commonly applied to mesh segmentation and classification; examples include PointNet [

An illustration of complexity, size, quality, and local resolution of meshes typically needed for realistic simulation of biophysical systems. Meshes are generated using

Given a high quality and high resolution mesh representation of a structural geometry (see

To facilitate population studies, it is important that structural datasets be made publicly available, as they commonly are in neuroscience [

Importantly, by running simulations on distributions of realistic shapes, we can generate experimentally testable hypotheses. This is much harder in -omics datasets, where mechanistic insight is usually obtained via constraint based modeling [^{2+} [

A major bottleneck in setting up accurate computational simulations of biophysical systems, idealistic or otherwise, revolve around the choice of constitutive equations, estimation of the free parameters such as reaction rate constants, diffusion coefficients, and material properties, and computational algorithms for solving the resulting governing equations numerically on these domains. While there is a large history of mathematical modeling in biology to set the stage for constitutive equations, estimation of free parameters remains a major challenge. Another major challenge for physically realistic models of signaling is knowing the location of the various molecules involved. Realistic geometries pose the additional challenge of requiring us to first understand the distribution of shapes, followed by analyzing simulation results across that distribution. Similar to how ML can be used in adaptive numerical methods to output a good mesh, ML can also be used for adaptive nonlinear data fitting to determine biophysical parameters with uncertainity estimates [

In this perspective, we have discussed three key aspects of a pipeline for realistic cellular simulations: (i) Reconstruction and segmentation of cellular structure; (ii) Generation of cellular structure; and (iii) Mesh generation, refinement and simulation. While these were discussed separately, neural networks like Pixel2Mesh demonstrate the feasibility of end-to-end pipelines from a single black box [

RV provided expertise on generation and simulation. MR provided expertise on imaging, segmentation, and reconstruction. CL provided expertise on reconstruction, meshing, and simulation. GJ provided expertise on imaging, segmentation, reconstruction, and generation. PR provided expertise on imaging, segmentation, reconstruction, modeling, and simulation. MH provided expertise on meshing, simulation, and analysis. All authors contributed to the writing of the manuscript and provided area-specific expertise.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

We would like to thank Prof. Pietro De Camilli and coworkers for sharing their datasets from Wu et al. [

^{2+}release by action potential configuration in normal and failing murine cardiomyocytes