Predictive Modeling of a Batch Filter Mating Process

Quantitative characterizations of horizontal gene transfer are needed to accurately describe gene transfer processes in natural and engineered systems. A number of approaches to the quantitative description of plasmid conjugation have appeared in the literature. In this study, we seek to extend that work, motivated by the question of whether a mathematical model can accurately predict growth and conjugation dynamics in a batch process. We used flow cytometry to make time-point observations of a filter-associated mating between two E. coli strains, and fit ordinary differential equation models to the data. A model comparison analysis identified the model formulation that is best supported by the data. Identifiability analysis revealed that the parameters were estimated with acceptable accuracy. The predictive power of the model was assessed by comparison with test data that demanded extrapolation from the training experiments. This study represents the first attempt to assess the quality of model predictions for plasmid conjugation. Our successful application of this approach lays a foundation for predictive modeling that can be used both in the study of natural plasmid transmission and in model-based design of engineering approaches that employ conjugation, such as plasmid-mediated bioaugmentation.


INTRODUCTION
Horizontal gene transfer by plasmid conjugation can result in rapid change in bacterial populations, as exemplified by the increased prevalence of plasmid-borne antibiotic resistance genes in response to the widespread use of antibiotics post-World War II (Davies and Davies, 2010). This capacity for rapid spread of genetic elements could be harnessed as a tool, for instance in the design of cellular computing platforms (Goñi-Moreno et al., 2013) or the modification of environmental bacterial populations through plasmid-mediated bioaugmentation (Top et al., 2002).
Efforts to quantitatively characterize conjugation processes have focused on describing the transfer rate (or "fertility") of specific plasmids in specific conditions. Early descriptions of transfer rates consisted of reports of the ratio of plasmid-donating to plasmid-receiving cells (Watanabe, 1963;Curtiss et al., 1969) or of threshold populations below which conjugation was not observed (Anderson, 1975). A systematic approach to describing transfer rates was presented by Levin et al. (1979), who developed a population-based ordinary differential equation (ODE) model to capture the temporal dynamics of conjugation in suspended batch cultures. Follow-up work ) presented a simple "end-point method" to estimate a conjugation rate from a single time-point measurement.
Subsequent projects made use of this end-point method for estimating conjugation efficiency in both suspended and attached cultures (Gordon, 1992;Duncan et al., 1995;Normander et al., 1998;Licht et al., 1999;Lilley and Bailey, 2002). The ordinary differential equation model of Levin et al. (1979) employs massaction kinetics, which are based on an assumption of spatial homogeneity. The utility of this model for describing conjugation in spatially heterogeneous conditions was defended by Simonsen (1990), whose analysis suggested that the mass-action based model accurately captures the dynamics of attached cultures inoculated from well-mixed suspensions provided the cell density is sufficiently high.
The model developed by Levin et al. (1979) was calibrated by just two parameters: a growth rate (shared by all subpopulations) and a mass-action conjugation rate. Over the years, a number of model variants have been explored, including the following. Simonsen et al. (1990) considered allowing for (i) transition to statonary phase, (ii) segregative loss, and (iii) transitory derepression of conjugation from newly formed transconjugants (zygotic induction), but found that, for the systems they were considering, none of these features significantly improved the accuracy of their end-point estimate of transmission rate. Lundquist and Levin (1986) considered the specific effects of transitory derepression in the context of plasmid maintenance. Freter et al. (1983) developed a model to describe the behavior of cultures in continuous flow reactors (as a proxy for the mammalian gut). Clewlow et al. (1990) fit a model that incorporated Verhulst logistic growth limitations in an attempt to describe populations growing in soil microcosms. Massoudieh et al. (2010) used a partial differential equation approach to address bacterial activity on granular porous media. A stochastic differential equation approach was described by Philipsen et al. (2010).
Recently, attempts to characterize conjugation dynamics have focused on capturing the spatio-temporal dynamics of the process using individual-based models (Krone et al., 2007;Merkey et al., 2011;García and Rodríguez-Patón, 2015;Goñi-Moreno and Amos, 2015). Using simulations of an individualbased model, Zhong et al. (2012) analyzed the potential accuracy of the ODE-based end-point method (and other non-spatial metrics) for estimating transfer efficiency. They found that, as expected, spatial effects can play a major role in conjugation, but that, for sufficiently dense populations, surface-associated conjugation dynamics can be reasonably well described by nonspatial models.
As highlighted by Sørensen et al. (2005), mathematical modeling approaches are crucial for characterizing and ultimately manipulating horizontal gene transfer in situ. As described above, mathematical characterizations of conjugation processes have focused on quantifying the rate of plasmid transfer. In this study, we pursued the more general goal of accurately characterizing all significant aspects of a proof-ofprinciple batch conjugation process involving two E. coli strains. This project was carried out toward the long-term goal of using such models in the engineering of systems involving plasmid transfer. Consequently, our primary aim was to confirm whether an ODE model of the process could be accurately calibrated and thus provide confident predictions of system behavior. We used model comparison and uncertainty analysis to arrive at a model formulation that is well-supported by our data, and tested our final calibration against experiments that demand extrapolation from the training data. (Similar model assessment techniques were employed in Philipsen et al. (2010), but the goal in that study was to accurately estimate the noise model.) Our approach is facilitated by the use of flow cytometry to assay culture subpopulations. As confirmed by del Campo et al. (2012), cytometric measurements agree with the more traditional colony counting approach, and provide improved precision without loss of accuracy.

Conjugation Assays
Donor (CSH26) and recipient (DH5α) strains were inoculated in 5 ml of LB supplemented with appropriate antibiotic (DH5α in Cm, CSH26 in Tet) and grown overnight at 37 • C at 250 rpm. To set up the experiment, 1 ml each of donor and recipient overnight cultures were centrifuged at 13,000 rpm for 1 min, decanted and resuspended in 1 ml of fresh LB broth (non-selective). Then, 400 µl of each resuspension was inoculated in 9.6 ml of fresh LB and grown in a 250 rpm shaking incubator at 37 • C to an optical density (OD 600 ) of 0.5 (Genesys 20 Visible Spectrophotometer, Thermo Fisher Scientific).
Filter matings were carried out in 47 mm diameter petri plates (Fisher Scientific) containing 1.5% LB agar upon which were laid 47 mm diameter 0.4 µm polycarbonate filters (Fisher Scientific). Eight filter mating experiments were carried out, numbered 1 through 8 below. In each case, donor and recipient suspensions were diluted to achieve desired cell densities prior to loading.
For experiments #1-4 the donor and recipient cultures were loaded nearly simultaneously. To begin, 200 µl of recipient cell suspension were manually spread evenly on each filter. The filters were then dried for 1 min before loading and spreading 200 µl of donor cell suspension. Plates were then kept at 37 • C until the desired time-point (specified below). Experiments #5-8 followed a staggered loading protocol; the procedure was identical to experiments #1-4 except that the second suspension (donor or recipient as specified in the caption of Figure 3) was loaded 120 min after the first suspension was loaded.

Flow Cytometric Analysis
Analysis was carried out on an Amnis Imagestream MkII flow cytometer (EMD Millipore). Data acquisition was implemented with the Amnis Inspire software package (EMD millipore). All samples were gated against gradient root mean square in the brightfield channel (labeled as gate R0), accepting images with scores between 50 and 80. Fluorescence was excited by a 488 nm solid state laser at 45 mW intensity. Green fluorescence was detected by a 533/55 nm band pass filter. Red fluorescence was detected at 610/30 nm. As a negative control, separate CSH26 and DH5α cultures (expressing GFP+/RFP-and GFP-/RFP+, respectively) were analyzed. For each, 10,000 R0 samples were collected; the results were used to construct a compensation matrix with the Amnis IDEAS software package (EMD Millipore). Gates for the three subpopulations of interest were then defined by setting thresholds for compensated nonfluorescence in the red and green channels, as shown in Figure 1. Cells were taken to be RFP-(gate R2) if the red fluorescence intensity was below 256.63. RFP+ cells were considered GFP-if the green fluorescence intensity was below 1,036.2 (gate R1). At each experimental time point, for each replicate, 20,000 R0 events were collected and sorted into the three subpopulations based on the threshold gating.

Estimating Population Sizes
For each measurement, the flow cytometry results indicate the population distribution among three subpopulations: donors (GFP+/RFP−), recipients (GFP−/RFP+), and transconjugants (GFP+/RFP+). Together with the OD 600 readings, these give relative measures of population size which were used to calibrate and analyse the model. To estimate corresponding absolute population sizes, we made use of the rule-of-thumb scaling for E. coli of 1.0 OD 600 to 8 × 10 8 cells per ml in LB broth (www.genomics.agilent.com/biocalculators/calcODBacterial.jsp, see also Milo and Phillips, 2015). To do so, we determined a scaling between OD 600 in PBS and OD 600 in LB. Using cultures from preliminary filter mating experiments with OD 600 in PBS ranging from 0.073 to 1.20, we resuspended and measured OD 600 in LB. Thirty-six samples (data not shown) were fit, resulting in the relation (OD 600 [LB]) = 0.869 × (OD 600 [PBS]) -0.0057, with R 2 of 0.995. Finally, to estimate populations on the filters, we assayed the efficiency of cell recovery by loading filters as described above and then comparing the OD 600 of the recovered population with the original suspension; these tests (data not shown) indicated a recovery efficiency of about 50%. Consequently, to estimate the cell counts on the filter, we multiplied our cell counts in suspension by a factor of two after accounting for the corresponding suspension volume to yield a cell count per ml. Finally, we divided by the filter area (17.35 cm 2 ) to arrive at a measure of cell density (cells/cm 2 ).

Simulation and Calibration
Models were simulated with the ode23s numerical integration routine in MATLAB (Mathworks). As described above, triplicate observations were made of the three culture subpopulations over a range of time points. For each model variant considered, and for each parameterization p, model predictions were compared against the training data (experiments #1-4) via the weighted sum of squares function: where y i obs (t k ) is the mean of the replicate measurements of subpopulation i at time t k , σ i (t k ) is the standard deviation of those measurements, and y i sim (p, t k ) is the model prediction of subpopulation i at time t k . Here i runs over the four training experiments (#1-4) and the three subpopulations (donors, recipients, transconjugants), while t k runs over the time-points: 0,40,80,120,160,200,240,280,320, 360 min (from initial loading of the filter).
For each model variant, the SSE was minimized by application of global optimization routines (simulated annealing, MATLAB function simannealbnd, and interior point algorithm, MATLAB function fmincon). Multiple starting points were employed to improve the chance of finding the global minimum. Parameter values were bound to the range [0, 500]. The initial values of the donor and recipient populations were treated as free parameters, to avoid over-weighting the measurements at these time-points. Transconjugant populations were initialized to zero.

Model Comparison
We used the Akaike Information Criterion (AIC) (Akaike, 1974;Burnham et al., 2011) to assess the suitability of each model. AIC describes a trade-off between the quality of the fit and the number of degrees of freedom granted by the model formulation. The corrected form of AIC was used to guard against overfitting: where m is the number of model parameters (kinetic parameters plus eight free initial conditions: donors and recipients in each of four experiments), n = 116 is the number of observations (four experiments, ten time-points, three populations, less the four initial transconjugant populations assumed to be zero), and SSE is the minimal weighted sum of squared errors (Equation 1). Presuming the errors are independent and normally distributed, Single-population controls were used to establish green and red fluorescence thresholds for identifying plasmid-bearing cells (regions R2 and R1, respectively). During the mating experiments, a population of RFP+/GFP+ transconjugants was established (region R3). The data is compensated (see Methods) and is plotted on a bi-exponential (logicle) scale (Tung et al., 2007), linear in the range [−1000, 1000].
the AICc provides a relative measure of the strength of evidence for each model; smaller AICc values correspond to increased model suitability.

Uncertainty Analysis
Uncertainty analysis was applied to gauge confidence in the bestfit estimates of the kinetic parameters. The initial conditions were not included in this analysis, as model-based predictions will not rely on estimates of initial conditions.

Sensitivity coefficients
Local absolute sensitivity coefficients were defined as: where y i sim (p, t k ) is the i-th model output at time-point t k and p j is the j-th parameter. These derivatives were approximated by finite differences of 1% in p j . Absolute sensitivity coefficients were scaled to generate dimensionless relative sensitivity measures: The overall sensitivity measure: where i runs over all subpopulations and experiments, and t k runs over all time-points, describes the degree to which each model parameter p j influences the model outputs.

Identifiability scores
To account for correlation among the parameters, the orthogonalization approach of Yao et al. (2003) was applied to arrive at practical measures of identifiability, as follows. A sensitivity coefficient matrix was constructed by arranging the relative sensitivity coefficients for each parameter column-wise: 1,1 (t 1 ) · · ·S 1,n p (t 1 ) . . . . . . . . .
where there are n p parameters, n o observables (running over experiments and subpopulations), and n T time-points. The column with the largest 2-norm (square root of sum of squared entries) is labeled X 1 . The corresponding parameter is judged the most identifiable; its identifiability score is the 2-norm of the corresponding column. Each column ofS is then projected onto X 1 , and the residuals are collected in matrix R 2 : The column of R 2 with the largest 2-norm corresponds to the next most identifiable parameter. The matrix X 1 is then concatenated with the column ofS that corresponds to that parameter, to form matrix X 2 . The residuals of the projection ofS onto X 2 are then determined (organized into matrix R 3 ), and the third most identifiable parameter is identified. This process is iterated to provide an identifiability score for each parameter (i.e., the 2-norm of the corresponding column vector of matrix R j ).

Confidence intervals
Two complementary approaches were applied. Both begin with construction of a matrix S as in Equation (5) but using the absolute (unscaled) sensitivity coefficients S i,j (Equation 3) as entries. Assuming the experimental errors are independent and normally distributed, the least squares error in Equation (1) can be used to provide a lower bound on the radius of the 95% confidence interval for parameter p j of Ashyraliyev et al. (2009): wherep is the parameter estimate (i.e., the minimizer of SSE(p)), m is the number of parameters, n = 108 is the number of observations (four experiments, three subpopulations, nine noninitial time-points), F 0.05 (m, n − m) is the value at 0.95 of the inverse of the cumulative distribution function for the Fisher's distribution with m and n − m degrees of freedom, and (·) jj is the jj-th matrix element. These results are reported as relative estimates ( p j /p j × 100%) in the following section.
A complementary approach to estimating confidence intervals, following (Emery and Nenarokomov, 1998;Gadkar et al., 2005), involves constructing the Fisher Information Matrix: where W is the inverse of the measurement covariance matrix. Then assuming the measurement errors are independent and normally distributed, a lower bound on the radius of the 95% confidence interval for parameter p j is given by: Given the samples sizes available in this study (triplicate observations), we could not estimate the full measurement covariance matrix. The measurement variances (diagonal entries in W) were calculated for each observation; the covariances (offdiagonal terms) were set to zero. Again, the relative estimate is reported.

Model Development and Model Comparison
Model development began with the three-state model of Levin et al. (1979) that describes dynamics of a recipient population (R), a donor population (D), and a transconjugant population (T). The model dynamics depend on the following assumptions: (1) the conjugation rate is jointly proportional to the abundance of plasmid-bearing and plasmid-free cells, (2) plasmid loss by segregation is negligible, (3) newly formed transconjugants can donate plasmids immediately, (4) the original donors and the transconjugants transmit plasmids at the same rate, (5) all subpopulations grow at the same exponential rate. The model takes the form: where ψ is the growth rate and γ is the conjugation rate. This model captures the essential features of conjugation dynamics, but a more complex model might be needed for accurate prediction of mating dynamics in specific conditions; in particular, we wanted to allow for distinct growth rates and growth profiles for all subpopulations. Consequently, we considered model extensions that incorporate (i) growth lag, as in Baranyi et al. (1993), (ii) transition to stationary phase, as in Simonsen et al. (1990), and (iii) distinct transmission and growth kinetics for each subpopulation. The most general model formulation we considered involves the three subpopulations (measured in cells/cm 2 ) and the fractional abundance of a limiting resource C (dimensionless measure, with initial value one): Frontiers in Microbiology | www.frontiersin.org where, for each index i = D, R, T, the parameter e i is a measure of resource depletion (cell −1 cm 2 ), and the growth rates ψ i and plasmid transfer rates γ j (j = D, T) are given by: with ψ i,max (min −1 ) and γ j,max (min −1 cell −1 cm 2 ) the maximal growth and transfer rates, respectively, K G,i and K T,j the resource levels at which growth and transfer are half maximal (dimensionless Monod constants), K L,i the time at which the growth rate is half maximal (min), and the Hill coefficient n i characterizing the abruptness of the end of lag phase (dimensionless). This fully general model involves nineteen kinetic parameters, compared to just two for the original model of Levin et al. (1979) (Equation 9). We expected that a suitable model could be found between these two extremes. To identify the most suitable formulation, we applied the Akaike information criterion, as described in the Methods section, to a range of model variants. Each model variant was fit against the training data as described in the Methods section (116 triplicate data points collected from experiments #1-4). For each experiment, the initial abundance of recipient and donor subpopulations were treated as free variables, to avoid overweighting the measurements at these time-points. Thus, each model fit addressed these eight free variables in addition to the free kinetic parameters. The model comparison results are shown in Table 1; the model variants are described below and in the Table caption.
The full model (Equation 10), labeled as model variant 0 (V0), resulted, of course, in the best fit (lowest SSE), but at the cost of a high degree of parameterization, as indicated by a high AICc value. An exhaustive comparison of all model formulations (which, through combinatorial combination would number in the thousands) was not feasible. Instead, we carried out a strategic comparison of a targeted set of model variants, as follows.
Our first steps toward model reduction were motivated by the parameter fits. Estimates of n i were close to one, while those of K G,i and K T,j were large relative to one. Eliminating these parameters led to models V1 and V2, with a modest reduction in quality of fit but a significant improvement in AICc score. Next, our expectations of system behavior led us to consider model V3, in which the resource-depletion rates of recipients and transconjugants are constrained to be equal, and V4, in which all three resource-depletion rates are equal. Likewise, models V5 and V6 incorporate assumptions of shared growth lag. Next, model V7 was reached by constraining the transmission rates of donors and transconjugants to be equal (reflecting an absence of transitory derepression); this simplification was motivated by similarity in the values of the corresponding parameter estimates.  max . V13 is the model in Simonsen et al. (1990). V14 is Equation (9). V15 is V7 with ψ T,max = 0.
As expected, each reduction in model complexity corresponds to a reduction in the quality of fit (i.e., larger SSE). The AICc values in Table 1 indicate that, for some simplifications, the loss of accuracy was compensated by reduction in degrees of freedom; for others, it was not. In particular, model V7 has the lowest AICc value of the first eight variants.
Models V8 through V12 represent further reductions of model V7: constraining all resource-depletion rates to be identical (V8); constraining all lag times to be zero (V9); constraining pairs of growth rates to be identical (V10, V11, V12). In each case the AICc indicates that the loss in accuracy is not compensated by the reduction in model complexity.
At this point, we did not expect to achieve an improved AICc value by further model reduction. Nevertheless, for completeness, we carried out comparisons with the model of Simonsen et al. (1990) (V13, with four kinetic parameterscommon growth, transmission, resource-depletion rates and Monod constant) and the model of Levin et al. (1979) (V14, with two kinetic parameters, Equation 9). The corresponding AICc values indicate that these models are less suitable than model V7.
A final variation came after a preliminary identifiability analysis indicated that, contrary to our expectations, the transconjugant growth rate could not be confidently distinguished from zero. (Indeed the estimated value of this parameter was orders of magnitude smaller than the estimated growth rates for the other two populations.) Model V15 resulted from fixing the transconjugant growth rate to zero in model V7. The most suitable model formulation (V15) is: with six kinetic parameters: ψ R,max , ψ D,max , K L , γ max , e R and e D . The model fits against experiments #1-4 are shown in Figure 2 (the full dataset is provided in Supplementary File S1); best-fit kinetic parameter values are reported in the next section.

Uncertainty Analysis
Uncertainty analysis was carried out on model V15 (equation 11) as described in the Methods section. The initial conditions were not considered free parameters in this analysis, because we are not concerned with accurately estimating the initial conditions when using the model to predict behavior. To determine the sensitivity coefficients, simulations were run from the best-fit initial conditions (reported in the caption of Figure 2). The results of the uncertainty analysis are presented in Table 2. Figure 3 shows the results of using model V15 (Equation 11) to predict the results of the staggered loading experiments #5-8, in which one of the two subpopulations was loaded 120 min after the first (the full dataset is provided in Supplementary File S1). For the simulations, the initial donor and recipient populations were set to the observed values. The transconjugant population was set at zero up to time 120.

Model Development
By comparing model variants by the Akaike information criterion (rather than relying on expectations of system behavior) we arrived at an unbiased selection of the model formulation that is best supported by the experimental results. AIC-based   model comparison is technically only justified when the errors are independent and normally distributed. That condition is only approximately satisfied in this case. (The errors, not shown, are symmetric, but are more heavy-tailed then a normal distribution.) Nevertheless, the AIC provides a useful guide to model comparison. Notably, it led to a model for which (i) there is no transitory derepression of the conjugative machinery, and (ii) the newly-formed transconjugants exhibit negligible growth over the experimental time-frame. These findings represent hypotheses regarding the plasmid-recipient pair under investigation. The Sørensen lab has carried out a number of studies addressing conjugation of pKJK10's parental plasmid pKJK5 (e.g., Bahl et al., 2007a,b), but the specific dynamic features represented by the model have not been previously investigated. The estimated growth kinetics in Table 2 must be interpreted carefully. Considered individually, the estimates correspond to unreasonable maximal doubling times of only 18 min for donors, 12 min for recipients, and a long lag phase-half maximal growth reached after 145 min. These best-fit estimates cannot be interpreted in isolation; they provide accurate descriptions of behavior in the specific context of a rapidly-depleting limiting resource. (In all simulations the growth rates were always well below their maximal values). Alternative experimental conditions (e.g., fed-batch) could allow the lag and maximal growth rates to be independently assessed. (The model variants that do not include lag or resource limitation provide some indication of what might be expected; e.g., for model V14, Equation 9) the common doubling time is estimated to be about 50 min).

Model Assessment
The overall sensitivity measure in Table 2 is a relative score indicating the degree to which the observations are influenced by each parameter; small scores indicate that the available data may be insufficient to accurately constrain parameter estimates. The corresponding identifiability scores discount the sensitivity measures by accounting for correlation between parameteric effects. Low scores indicate low potential for confidently identifying parameter values. In Yao et al. (2003), an identifiability cut-off value of 0.04 was recommended. All model parameters are identifiable by that criterion. (When model V7 was analyzed, the transconjugant growth rate was not above this identifiability threshold, and so it was eliminated to yield model V15.) The last two columns of Table 2 contain estimates of 95% confidence intervals on the parameter estimates. These results, like the AIC-based model comparison, depend on an assumption that the errors are independent and normally distributed, which holds only approximately. The FIM-based estimate is derived under an assumption that the model itself is accurate (so that experimental error is the only source of variance). The complementary nonlinear regression approach incorporates model mismatch into its estimate. (A complementary upper bound calculation (Ashyraliyev et al., 2009) was not informative as it yielded estimates uniformally larger than ±100%.) While these results clearly indicate there is room for improvement, they suggest that there is strong potential for accurate estimation of the system kinetics with sufficient data. Finally, Figure 3 demonstrates that the model can provide reasonably accurate predictions of system behavior. To reiterate, the data from the test experiments (#5-8) was not used for model fitting or model development. Moreover, the protocol followed for experiments #5-8 was dynamically distinct from the training experiments (#1-4). The training data (Figure 2) was collected from simultaneous loading experiments. The testing data (Figure 3) was collected from staggered loading experiments. Figure 3 thus demonstrates the model's ability to extrapolate to distinct dynamic regimes. The fits are not perfect, the worst prediction being the donor population in experiment #6 (perhaps as a result of the donor-recipient ratio at loading being too far beyond those of the training data). Nevertheless, the overall quality of the predictions speaks to the potential of the chosen model variant to predict system behavior.

Alternative Modeling Frameworks
Of course, any ODE-based model cannot be expected to provide highly accurate descriptions of surface-associated culture behavior, as spatial heterogenity is bound to have an effect on system dynamics. While the success of our approach indicates that such spatial features are not crucial to predicting behavior in the high-density mixed-distribution cultures observed in this study, spatial aspects will no doubt dominate in more heterogeneous environments or when subpopulations are not evenly distributed. Spatially explicit models offer appropriate frameworks in these cases: partial differential equation (PDE) models offer valuable descriptions of continuous spatial distributions, e.g., Massoudieh et al. (2010), while individualbased models provide increased resolution by describing the population in terms of individual cells (Kreft et al., 2013). Simple individual-based models represent cells as occupying positions in a regular lattice, e.g., Krone et al. (2007) and García and Rodríguez-Patón (2015), while more complex models, such as the DiSCUS model developed by Goñi-Moreno and Amos (2015) capture cell morphology. Related individual-based modeling frameworks, presented in Lardon et al. (2011) and Rudge et al. (2012), focus on biofilm formation but do not currently describe gene transfer. Importantly, these modeling frameworks have the capacity to capture inevitable variability within populations. Continued model developments will facilitate investigations of natural microbial populations (Marino et al., 2014;Widder et al., 2016) and the design of synthetic microbial ecologies (Zomorrodi and Segrè, 2016).

Conclusion
In this study we expanded on previous modeling efforts by assessing the quality of parameter estimates and the accuracy of extrapolative predictions. This project represents a step toward a modeling framework that can be used for confident prediction of the behavior of engineered microbial communities, such as those employed in the conjugation-based cellular computing systems described in Goñi-Moreno et al. (2013). In particular, plasmid-mediated bioaugmentation holds promise in a range of applications, including bioremediation (Top et al., 2002) and suppression of antibiotic resistance (Baquero et al., 2011;Gooding-Townsend et al., 2015) (see also Yosef et al., 2015). The success of such efforts will depend upon the development of techniques for accurate model-based design in cellular bioengineering.