Dynamics of chromosome organization in a minimal bacterial cell

Computational models of cells cannot be considered complete unless they include the most fundamental process of life, the replication and inheritance of genetic material. By creating a computational framework to model systems of replicating bacterial chromosomes as polymers at 10 bp resolution with Brownian dynamics, we investigate changes in chromosome organization during replication and extend the applicability of an existing whole-cell model (WCM) for a genetically minimal bacterium, JCVI-syn3A, to the entire cell-cycle. To achieve cell-scale chromosome structures that are realistic, we model the chromosome as a self-avoiding homopolymer with bending and torsional stiffnesses that capture the essential mechanical properties of dsDNA in Syn3A. In addition, the conformations of the circular DNA must avoid overlapping with ribosomes identitied in cryo-electron tomograms. While Syn3A lacks the complex regulatory systems known to orchestrate chromosome segregation in other bacteria, its minimized genome retains essential loop-extruding structural maintenance of chromosomes (SMC) protein complexes (SMC-scpAB) and topoisomerases. Through implementing the effects of these proteins in our simulations of replicating chromosomes, we find that they alone are sufficient for simultaneous chromosome segregation across all generations within nested theta structures. This supports previous studies suggesting loop-extrusion serves as a near-universal mechanism for chromosome organization within bacterial and eukaryotic cells. Furthermore, we analyze ribosome diffusion under the influence of the chromosome and calculate in silico chromosome contact maps that capture inter-daughter interactions. Finally, we present a methodology to map the polymer model of the chromosome to a Martini coarse-grained representation to prepare molecular dynamics models of entire Syn3A cells, which serves as an ultimate means of validation for cell states predicted by the WCM.


(S2)
Comparison to experiments on short oligonucleotides show that this formula is valid down to p = 1.4 (Eimer and Pecora, 1991). If we were to model 10 bp segments as cylinders, then L = 3.4 nm, d = 2.0 nm, and p = 1.7.
The discrepancy in approximating the cylinders using the Stokes-Einstein equation for spherical particles with hydrodynamic radius r = L/2 is then γ T γ T cyl.
= log p + v ≈ 1.15, which we consider to be an acceptable approximation for our chromosome-scale modeling.
Diffusion. Given a set of N particle coordinates at each trajectory frame of a replicate simulation, {x i (t)}, the mean-squared displacement (MSD) between times t and t + τ is calculated as Given a set of MSD values for trajectory frames of a replicate simulation, {MSD(τ i )}, the Brownian diffusion constant (Oliveira et al., 2019;Muñoz-Gil et al., 2021) for 3D diffusion is calculated as the least-squares solution to MSD(τ i ) = 6Dτ i .

(S5)
Given a set of MSD values for trajectory frames of a replicate simulation, {MSD(τ i )}, the anomalous diffusion law is where D a is the anomalous diffusion constant and α is the power-law exponent (Oliveira et al., 2019;Muñoz-Gil et al., 2021). These are then calculated as the least-squares solution to log MSD(τ i ) MSD(τ 1 ) = α log τ i τ 1 + log(D a /D a ).

Radial Distribution Functions.
Due to the absence of a clear nucleoid region in cryo-ET of Syn3A (Gilbert et al., 2021), radial distribution functions were used to characterize the crowding effects of DNA on the ribosomes distributed near-uniformly throughout the cytoplasm. Smooth estimates of radial distribution functions were calculated using a spectral Monte Carlo method (Patrone and Rosch, 2017). Consider that the radial distribution function is expanded using a set of smooth orthogonal basis functions, φ j (r), where these functions are orthogonal on the interval, [0, r c ], extending from the test particle at the origin to a cutoff radius, r c , with respect to the weight function, w(r), We can use the orthogonality relation to solve for the spectral coefficients which leads to The quantity N (r)dr is the expected number of particles in a spherical shell of radius r and thickness dr centered about a particle at the origin. The probability density, n(r), of finding a single particle in such a shell is proportional to this and we may rewrite the equation for the spectral coefficients in terms of an expectation value over the probability density n(r) This expectation value can be estimated using Monte Carlo integration of pairwise distances that are sampled from the Brownian dynamics simulations In the case of the radial distribution function of DNA monomers about a ribosome at the origin, the number of pairs in a single trajectory frame is given by a sum of the the number of ribosomes, N ribo , where for j-th ribosome we then sum over the DNA monomers within a distance r c , denoted as N DNA (j, r c ), and where (S17) For notational convenience, we will continue denoting this as a sum over the pairs. Using the change of variable we can define Chebyshev polynomials of the first kind, T n (x), which are given by the recurrence relation and follow the orthogonality relation (Press et al., 2007). We then use these Chebyshev polynomials in the transformed variable, ζ, as our basis functions and their accompanying weight function w(r) = ω ζ(r) , φ k (r) = T k ζ(r) , and n(r) =ñ ζ(r) .

(S21)
This implicitly changes the expectation value as follows where this is now an expectation value over the probability density of separation distances,ñ(ζ), in the transformed variable, ζ, which is defined on the interval [−1, 1]. Defining ζ j = ζ(r j ), we can now estimate this expectation value again as a sum over the same pairs The spectral coefficients for an expansion of the ribosome-DNA radial distribution function in terms of Chebyshev polynomials are estimated as and N DNA (r c ) can estimated as The first factor in the estimator for the coefficients is a normalization for the selected basis functions and the second factor is the ratio of the average local density of DNA within the cutoff radius, ρ DNA (r c ) = N DNA (r c )/V sphere (r c ), relative to the bulk DNA density. After estimating the spectral coefficients, the radial distribution function can be estimated to an arbitrary resolution in spatial distance. We found that the first 100 modes were sufficient to estimate smooth radial distribution functions with a negligible oscillations at the boundaries ( Figure 5C).

Ideal Partitioning.
To determine the ideal partitioning of a sphere into two volumes, we exploit the spherical symmetry and consider the equation of a semi-circle in the x-y plane to calculate the relative volumes, V l and V r , using the leads to the total volume given by Using the change of variable we rewrite this in terms of a unit-sphere, and then consider α as the fraction of the distance between poles of the sphere at ξ = −1 and ξ = 1 to solve for two volume terms for the respective daughters in terms of α where V T is the total volume of the sphere. Now we must solve for α such that

(S31)
Assuming a uniform density of monomers ρ, the mass of one of these volumes is m (l/r) (α) = ρV (l/r) (α), and moments of these about y/z-axis are given by The centroids of these two volumes are located at .

(S35)
The distance between the centers of mass of daughters with N l and N r monomers when ideally partitioned is then .

(S36)
In silico Contact Calculations. In this section we will present how we solve the forward problem of generating chromosome contact maps from our mechanistic model of Syn3A's chromosome. A single unreplicated chromosome is comprised of N monomers and its configurational state is given by the set of N coordinate vectors {x i }. For the purposes of calculating chromosome contacts at bp per locus resolutions coarser than the monomer size (10 bp), the chromosome is coarse-grained by dividing it into M loci comprised of regions of n j bonded monomers in a contiguous series. The configurational state of the j-th loci is given by the set of n j coordinate vectors, {x j i }. The pairwise distance between monomer i of locus j and monomer i of locus j is denoted Defining Q j,j as the number of pairwise distances between loci j and j there are two cases.
For both cases, we may equivalently write the set of loci(j)-loci(j ) distances using a single index ranging from 1 to Q j,j as Although we use loci of uniform size in this study, it is possible to use non-uniform sizes, and at 10 bp per monomer the loci could be comprised of regions partitioned by restriction enzyme cut sites (Lieberman-Aiden et al., 2009;Crémazy et al., 2018). A number of transfer functions connecting loci/particle/monomer distances D j,j to the relative contact frequency F j,j have been used in past studies (Meluzzi and Arya, 2012;Serra et al., 2015;Le Treut et al., 2018;Abbas et al., 2019;MacKay and Kusalik, 2020). However, in our case the loci(j)-loci(j ) distances measured are the set of distances between their constituent monomers and we cannot use an iterative approach to refine a parameterization of the transfer function (Zhang et al., 2013). We instead quantify the contact probability through a simplified model of the crosslinking process (Hoffman et al., 2015). We make the following assumptions about the crosslinker: i) crosslinker (formaldehyde) has diffused uniformly throughout the cell, ii) proteins are distributed uniformly throughout the cytoplasm, iii) the crosslinker acts in a manner independent of the DNA sequence, iv) crosslinking occurs pairwise between monomers and is independent of other crosslinking events, v) crosslinking probability p cl (r) is a monotonically decreasing function of spatial distance. We tested multiple candidate functions with a common set of length-scales used to parameterize them for the crosslinking probability (Supplementary Figure S3) and chose the hyperbolic tangent function ( Figure S3C) as it provided the most robust results for the power-law fits of contact frequency versus genomic distance. Given these assumptions and a set of loci(j)-loci(j ) distances, R j,j , the total probability of crosslinking between loci, P j,j , is the probability of the union of all possible crosslinking events P j,j R j,j ∝ Q j,j k=1 p cl (r j,j k ).

(S40)
If this procedure is repeated for a set of N replicates replicates each containing N timesteps timesteps, the total crosslinking probabilities can be calculated an ensemble-and/or time-average over these sets of configurational states. When calculating the in silico contact maps we implicitly assume that the successful crosslinking would be followed by further processing steps (Crémazy et al., 2018) with 100% efficiency and therefore directly equate the loci(j)-loci(j ) crosslinking probabilities, P j,j , with the loci(j)-loci(j ) relative contact frequencies, F j,j . In a final processing step, all contact maps are then normalized (Cournac et al., 2012) to be doubly-stochastic matrices using the matrix-balancing method of Knight and Ruiz (Knight and Ruiz, 2012), where after normalization rows and columns each sum to one j F j,j = 1 and j F j,j = 1.

(S41)
To improve visual clarity and contrast, for all contact maps plotted using a log-scale within this work, after all zeros are replaced with the minimum non-zero value, the lower-limit of the colorbar is set to the 10 th percentile of contact frequencies observed in the map.

Martini Visualization
Visualization of an entire Syn3A cell in the Martini representation was performed using Visual Molecular Dynamics (VMD) (Humphrey et al., 1996). Working with a structure of nearly 100 million atoms required non-standard VMD practices. Here, we will only discuss the non-standard practices used to generate the render of the Martini representation ( Figure 9). We will divide these into three tasks: (1) Loading a large structure -After loading the Gromacs structure file (.gro) into VMD, we first exported the structure as a ".js" file, a custom file format that is optimized for VMD and enables rapid loading of coordinates in the future.
(2) Processing atom selections -The atom selection for each representation in VMD is processed individually. Furthermore, each term within a boolean expansion in the "atom selection" window is evaluated sequentially, which may potentially lead to billions of iterations. Ideally, one should collapse all selections sharing a common visual representation into a single selection and use regular expressions (resname "A|B|C" instead of resname A B C) within the selection to reduce loop executions. To differentiate atoms of different types combined in the same selection, we then set the representation to color by "resname" and adjust the resname colors in the "Graphics/Colors" menu. If selections involve geometric calculations (ex. membrane cutaway in Figure 9), these selections can be made persistent to avoid recalculation by assigning them to an unused property such as "segname".
(3) Choosing visual representations -Due to the size of the structure, the OpenGL window will lag as changes are made to the visualization, this is true even for simple tasks such as rotating the structure. Choosing visual representations that do not require additional calculations and reduce the number of polygons drawn in the OpenGL window helps to minimize this behavior. We used the "points" representation for all selections while editing the visualization. Further improvements can be gained by hiding all selections that are not being actively modified. Once all selections have been modified to the user's satisfaction, change each selection from the "points" to the final visual representation before using the "File/Render" function. We chose to use the "TachyonL-OptiX (interactive, GPU-accelerated)" renderer so that we could fine-tune the orientation of the structure for the final render.  Humphrey et al. (1996)

Supplementary Figures
Step 2) Increase Resolution: Interpolate Segments of Reduced-Length Step 1) Grow Chain: Insert Spherocylinder Segments of Length L do while N < Nfinal Step 1) Grow Chain Step 2

Contact Frequency
Example Chromosome (1 of 250) Figure S1. A) Schematic of algorithm using midpoint-displacements to generate initial conditions of chromosomes organized as fractal globules. B) Initial configurations of single unreplicated chromosome in small Syn3A cell and ribosome distribution from (Gilbert et al., 2021). C) Initial configurations of single unreplicated chromosome in large Syn3A cell and ribosome distribution from (Gilbert et al., 2021). Figure S2. Spherical 250 nm radius cell containing 1000 ribosomes (pink) filled with two 543 kbp chromosomes (blue and green) organized as fractal globules. On the left and right one of the chromosomes are alternatively made transparent to better highlight the well-defined interface between the two chromosomes that nearly bisects the spherical shape and shows a low degree of mixing. The two chromosomes were relaxed from their initial state by minimizing their energy while obeying the energy function.   Figure 10A). (continued from previous figure)˜A The system is simulated for 12.0E+6 total timesteps and replication in 2,000 monomer steps begins after the initial 2.0E+6 timesteps. Representative snapshots at the corresponding times are displayed above the graph.   Figure SV1. Video of a circular dsDNA polymer in a trefoil knot being pulled taut with and without the action of topoisomerases. Equal and opposite forces are applied to the Ori (red) and Ter (orange) for both cases. In the case with topoisomerases, the soft topoisomerase potentials are switched on for a brief interval beginning at the final third of the video (≈10s video time), strand-passages then occur and unknot the polymer. Figure SV2. Video of a full Syn3A chromosome (54,338 monomers) undergoing simultaneous replication and disentanglment of daughter chromosomes while under the influence of loop-extruding SMC protein complexes and topoisomerases allowing strand-passage. The replicating chromosome is colored in accordance with the scheme used throughout the paper (Figure 2A), where the newly replicated right daughter is colored with yellow/green. Forks are magenta, Oris are red, and Ter s are orange. The 500 ribosomes in the system are represented as transparent spheres. The video begins with a single unreplicated chromosome that has not previously been subject to looping. The video proceeds to show replication of 20,000 monomers (200 kbp) in 2,000 monomer (20 kbp) steps with segregation occuring simultaneously.