Using PyBioNetFit to leverage qualitative and quantitative data in biological model parameterization and uncertainty quantification

Miller, Ely F.; Mallela, Abhishek; Neumann, Jacob; Lin, Yen Ting; Hlavacek, William S.; Posner, Richard G.

doi:10.3389/fimmu.2026.1663008

ORIGINAL RESEARCH article

Front. Immunol., 29 April 2026

Sec. Systems Immunology

Volume 17 - 2026 | https://doi.org/10.3389/fimmu.2026.1663008

Using PyBioNetFit to leverage qualitative and quantitative data in biological model parameterization and uncertainty quantification

1. Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, United States
2. Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM, United States
3. Theoretical Biology and Biophysics Group, Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, United States
4. Department of Chemistry and Chemical Biology, Cornell University, Ithaca, NY, United States
5. Information Sciences Group, Computer, Computational and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, NM, United States

Abstract

Data generated in studies of cellular regulatory systems are often qualitative. For example, measurements of signaling readouts in the presence and absence of mutations may reveal a rank ordering of responses across conditions but not the precise extents of mutation-induced differences. Qualitative data are often ignored by mathematical modelers or are considered in an ad hoc manner, as in the study of Kocieniewski and Lipniacki (2013) [Phys Biol 10: 035006], which was focused on the roles of MEK isoforms in ERK activation. In this earlier study, model parameter values were tuned manually to obtain consistency with a combination of qualitative and quantitative data. This approach is not reproducible, nor does it provide insights into parametric or prediction uncertainties. Here, starting from the same data and the same ordinary differential equation (ODE) model structure, we generate formalized statements of qualitative observations, making these observations more reusable, and we improve the model parameterization procedure by applying a systematic and automated approach enabled by the software package PyBioNetFit. We also demonstrate uncertainty quantification (UQ), which was absent in the original study. Our results show that PyBioNetFit enables qualitative data to be leveraged, together with quantitative data, in parameterization of systems biology models and facilitates UQ. These capabilities are important for reliable estimation of model parameters and model analyses in studies of cellular regulatory systems and reproducibility.

1 Introduction

In systems biology modeling, practitioners routinely disregard purely qualitative data—such as ordinal categorizations (e.g., low/medium/high), binary outcomes of case-control comparisons (e.g., up/down relative to a reference), and simple threshold crossing indicators (e.g., yes/no)—as well as semi-quantitative data, such as densitometric readouts of Western blots, which typically lack full quantification due to potential signal saturation and absence of calibration experiments. Bias against use of such data perhaps arises from the rich abundance of quantitative data (e.g., sequential measurements of state variables having high precision and fine time resolution) in fields such as physics, in which biological modelers are commonly trained. Nevertheless, qualitative/semi-quantitative data have been leveraged in influential biological modeling studies (1), such as those of Tyson and co-workers focused on modeling yeast cell-cycle control (2–6). In these studies, qualitative/narrative descriptions of yeast mutant phenotypes were used to constrain parameter estimates. Other related studies include those of Kinney and co-workers focused on using noisy semi-quantitative readouts of massively parallel reporter assays to inform the modeling of regulation of gene expression (7–10).

In recent years, there has been increasing appreciation of the need to leverage all types of available data in biological modeling, as evidenced by the development and/or application of various approaches for using qualitative/semi-quantitative data in model parameterization, optionally in combination with quantitative data. Demonstrated approaches include likelihood-free inference conditioned on binary data (11), data-transformation methods (e.g., optimal scaling) (12–16), information-theoretic approaches (9, 17), constrained heuristic optimization in which qualitative observations are formalized as constraints (18, 19), and (multinomial) probit regression (20). Software tools developed to cope with qualitative/semi-quantitative data include PyBioNetFit (20, 21), pyPESTO (15, 16, 22), and MAVE-NN (17).

Here, in a new demonstration of the ability of PyBioNetFit to leverage both quantitative and qualitative data, we redo the parameterization of an ordinary differential equation (ODE) model developed by Kocieniewski and Lipniacki (23) to study the roles of MEK isoforms in ERK activation. Originally, this model was parameterized by tedious trial-and-error manual tuning of parameter values for consistency with a mix of published qualitative and quantitative data (24, 25). The qualitative data included orderings of low-resolution readouts measured for parental and perturbed/mutant cell lines. In our re-analyses of these data, we define qualitative data unambiguously using the Biological Property Specification Language (BPSL) (21), which increases the reusability of these data, and we leverage both the qualitative and quantitative data using various features of PyBioNetFit, which yields a systematic and automated approach to parameter estimation and uncertainty quantification (UQ). This approach increases reproducibility. Furthermore, we clarify that the parameter estimation method of Mitra et al. (19), which is implemented in PyBioNetFit (21) and which we leveraged in our study, can be viewed as a likelihood-based inference procedure. Through metaheuristic optimization, we obtain maximum likelihood estimates for parameter values, and for UQ, we perform profile likelihood analysis (26). In addition, through application of an adaptive Markov chain Monte Carlo (MCMC) sampling algorithm (27) implemented in PyBioNetFit (28), we obtain samples representing a Bayesian parameter posterior, which allows us to quantify both parametric and prediction uncertainties. Prediction uncertainties are quantified by posterior predictive distributions. Thus, we demonstrate UQ in two ways, each of which considers both quantitative and qualitative data. The absence of UQ was a notable limitation of the original study of Kocieniewski and Lipniacki (23), a limitation rooted in the ad hoc labor-intensive parameterization approach of that study.

The results presented here provide a new illustration of how PyBioNetFit facilitates efficient and reproducible use of qualitative data (together with quantitative data) in parameterization of systems biology models. The results also provide an illustration of PyBioNetFit capabilities useful for UQ, which is essential for a variety of reasons, from assessment of model credibility to model improvement. This report should be useful to researchers who wish to leverage qualitative data in parameterization of biological models. For the convenience of readers unfamiliar with specialized terminology (e.g., BNGL, BPSL, UQ), we provide a glossary of key terms in the Supplementary Material.

2 Materials and methods

2.1 Model for parental cell line and model variants for mutant cell lines

Kocieniewski and Lipniacki (23) defined their model for normal, or wild type (WT), MEK1 signaling using the conventions of the BioNetGen language (BNGL) (29) and made the model available online in the form of a BNGL file (a plain-text file with a.bngl filename extension) within a ZIP archive (https://iopscience.iop.org/article/10.1088/1478-3975/10/3/035006/data). Annotation at the top of this BNGL file describes how to modify the model to account for MEK1 knockout (KO) and three MEK1 mutations: N78G, which ablates MEK1 dimerization with itself and MEK2; T292A, which ablates ERK-mediated inhibitory phosphorylation of MEK1 and concomitant negative feedback; and T292D, a phosphomimetic mutation. From this starting point, we created separate BNGL files for parental and mutant cell lines labeled WT, N78G, T292A, and T292D. The resulting BNGL files, MEK1_WT.bngl, MEK1_KO.bngl, MEK1_N78G.bngl, MEK1_T292A.bngl, and MEK1_T292D.bngl, are freely available online (https://github.com/lanl/PyBNF/tree/master/examples/Miller2025_MEK_Isoforms) and can be processed by PyBioNetFit (21), version 1.1.9. In PyBioNetFit workflows, models can be defined using either BNGL or Systems Biology Markup Language (SBML) (30). BioNetGen (31) can convert a BNGL file to an SBML file. SBML files are plain-text files with.xml filename extensions.

2.2 Data used in model parameterization

Kocieniewski and Lipniacki (23) carefully identified the data that they used in model parameterization. We collected these same data from the primary sources, namely the reports of Catalanotti et al. (24)Kamioka et al. (25). Each qualitative observation from the case-control comparisons of Catanotti et al. (24)—e.g., up/down relative to a reference at a particular time after initiation of signaling—was formalized using the Biological Property Specification Language (BPSL) (21). BPSL statements, which can be interpreted by PyBioNetFit, were then collected in PROP files (plain-text files with.prop filename extensions). See BPSL examples and breakdowns in the box below, and all BPSL statements used in our study are available in the Supplementary Tables 1–5. Quantitative observations, time-series data, from the study of Kamioka et al. (25) were collected/tabulated in an EXP file (a plain-text file with a.exp filename extension). It should be noted that, in an EXP file, “nan” indicates a missing measurement. The resulting EXP and PROP files are freely available online (https://github.com/lanl/PyBNF/tree/master/examples/Miller2025_MEK_Isoforms) and can be processed by PyBioNetFit (21), version 1.1.9. It should be noted that each EXP and PROP file is meant to be used with a particular model variant, as indicated by the labels WT, N78G, T292A, and T292D. For example, during model parameterization, the data collected in the WT-labeled EXP and PROP files (WT.exp and WT.prop) are compared against the corresponding outputs of the WT model (defined in the MEK1_WT.bngl file). The relevant model outputs are identified in accordance with established conventions (21); for example, in an EXP file, times are indicated by row labels and each column header has the name of an observable or function defined in a BNGL file. We note that the only non-empty EXP file maps to the WT model, because quantitative, time-series data was only available for normal signaling (i.e., signaling unperturbed by MEK1 knockout or mutation). Please see Table 1 below.

Table 1

BPSL statement	Prose statement
WT.MEK_pRDS at time=300 < N78G.MEK_pRDS at time=300	The copy number of RAF-phosphorylated (pRDS) normal/wildtype (WT) MEK (in parental cells) is less than the copy number of RAF-phosphorylated MEK with mutation N78G (in a derived cell line) at 300 s after stimulation of RAS/RAF/MEK/ERK signaling.
N78G.pERK1_2_wt at time=300>T292D.pERK1_2_wt at time=300	The copy number of MEK1/2-double phosphorylated ERK with mutation N78G at 300 seconds is less than MEK-double phosphorylated ERK with mutation T292D at 300 seconds after simulation of RAS/RAF/MEK/ERK signaling.

Examples of qualitative observations and their formal BPSL representations.

2.3 Maximum likelihood estimation and likelihood profiling

Across the five model variants under consideration (WT, KO, N78G, T292A, and T292D), there are 46 total parameters (Table 2). We took 28 of these to be adjustable and set the other 18 at fixed values specified by Kocieniewski and Lipniacki (23). Six of these parameters () correspond to the simulating the initial concentrations of EGFR, Sos1, and EGFR/ligand subcomplexes. In preliminary analyses, we found that estimating these parameters directly added unnecessary computational complexity without altering model behavior. Instead, we determined their peak concentrations from early simulation runs and fixed those values before initiating the cascade. This approach simplified parameterization while preserving biological realism. The remaining 12 fixed parameters define the initial conditions of the resting-state cell and mutants. These were held at the original values from Kocieniewski and Lipniacki (23) to maintain comparability with their published results.

Table 2

Parameter	Parameter units	Parameter description
	Not Provided	c1 value after stimulation with ligand
		EGFR receptor dimerization dueligand
		EGFR subunits transphosphorylation in EGFR dimer
		degradation of ligand bound dimer complexes
		association of phosphorylated receptor EGFR subunits with SOS1
		disassociation of EGFR receptor subunits from SOS1
		MEK1 homodimer formation
		MEK1 dimer dissociation
		MEK2 homodimer formation
		MEK2 homodimer dissociation
		MEK1 and MEK2 heterodimer formation
		MEK1 and MEK2 heterodimer dissociation
		activation of Ras by EGFR SOS1 complex (exchange of GDP for GTP)
		inactivation of RAS (hydrolysis of bound GTP to GDP)
		activation of RAF by RAS-GTP
		inactivation of RAF
		phosphorylation of MEK1 and MEK2 on the activation sites by RAF
		dephosphorylation of MEK1 and MEK2 on the activation sites
		phosphorylation of ERK on the activation sites by MEK1
		phosphorylation of ERK on the activation sites by MEK2
		dephosphorylation of ERK on the activation sites
		feedback phosphorylation of SOS1 by active ERK
		dephosphorylation of the SOS1 feedback site
		feedback phosphorylation of MEK1 by ERK
		dephosphorylation of the MEK1 feedback site (Thr292)
		PHP phosphatse binding to Thr292p of MEK1
		dissociation of PHP phosphatase from Thr292p of MEK1
		dephosphorylation of MEK1 and MEK2 on the activation sites
		EGFR receptor subunit constitutive production
		EGFR receptor subunit constitutive degredation
		Sos1 constitutive production
		Sos1 constitutive degredation
	N/A	formation of EGFR receptor subunit-ligand complex Starting Signal
		c₁ value after simulation with ligand
*	N/A	MEK1 fraction of total MEKs content
*	N/A	MEK2 MEK1 kinase activity ratio
*	Copy Number	initial EGFR level
*	Copy Number	initial Sos1 level
*	Copy Number	initial RAS level
*	Copy Number	initial RAF level
*	Copy Number	initial MEK1 and MEK2 total level
*	Copy Number	initial MEK1 level
*	Copy Number	initial MEK1-Thr292p level
*	Copy Number	initial MEK2 level
*	Copy Number	initial combined ERK1 and ERK2
*	Copy Number	level

Original parameter value estimates of the MEK isoform model determined by Kocieniewski and Lipniacki (2013).

All original 46 parameters are shown, including a brief description of their function within the MEK isoform model. Parameters marked with a single asterisk (*) denote parameters that were fixed during PyBioNetFit’s automated parameterization of the model.

Furthermore, all 28 adjustable parameters chosen to be used in our study were initially centered on Kocieniewski and Lipniacki’s (23) parameter values and given bounds of one or two orders of magnitude (10x - 100x) in either direction of the initial value (Table 3). Some parameter bounds were further adjusted if the algorithm began pushing the limit of a given bound. To ensure biological meaningfulness, we imposed physical constraints where applicable (e.g., rate constants bounded below by zero and above by the diffusion limit; fractional parameters bounded between 0 and 1). We made a conscious effort making these constraint choices to help maintain realism while supporting numerical stability in both optimization and Bayesian inference. All parameter bounds can be seen in Table 3. In addition, we introduced 3 adjustable scaling factors, which are useful for comparing WT model outputs to serial measurements. Consequently, in parameter estimation, we considered a total of 31 adjustable parameters, which we will denote as . To obtain point estimates, , for the 31 adjustable parameters , we minimized an objective function , the form of which was previously described by Mitra et al. (19). The objective function accounts for all qualitative data (a total of 90 BPSL statements) and all quantitative data (a total of 18 serial measurements). We constrained to lie within a feasible region of parameter space, which we will denote as . The feasible region was defined by box constraints (lower and upper bounds) applied to each adjustable parameter. In summary, we found.

Table 3

	Optimization bounds [low, high]	Bayesian UQ bounds [low, high]
		N/A
		N/A
		N/A

		N/A
		N/A
		N/A
		N/A
		N/A
		N/A
		N/A
		N/A
		N/A
		N/A
		N/A
		N/A
		N/A
		N/A
		N/A
		N/A
		N/A
		N/A

		N/A
		N/A
		N/A
		N/A
		N/A
*
*
*
	N/A

Best-fit parameter estimates.

This table lists the 31 parameters selected for model calibration using PyBioNetFit. Each row indicates the name of a parameter, the best-fit value obtained from a global fit, and parameter bounds used in optimization and Bayesian UQ (if applicable). Parameters marked with a single asterisk (*) are scaling factors added specifically for the WT model to allow model outputs to be compared to published measurements reported in arbitrary units (AU). Parameters with a double asterisk (**) are special hyper sampling parameters used only in Bayesian UQ simulations needed for PyBioNetFit’s Adaptive MCMC algorithm and not originally included in the model by Kocieniewski and Lipniacki (2013).

Thus, is the product of a global fit. As explained below, the objective function is related to a likelihood, and moreover, minimizing the objective function maximizes this likelihood. Thus, our point estimates are maximum likelihood estimates (MLEs).

The fitting problem setup, including identification of each adjustable parameter and the corresponding box constraints, and workflow were defined with a PyBioNetFit configuration, or CONF, file. This file (a plain-text file with a.conf filename extension) is freely available online (https://github.com/lanl/PyBNF/tree/master/examples/Miller2025_MEK_Isoforms/MEK_isoform_optimization_DE). In parameter estimation, the objective function was minimized using PyBioNetFit’s implementation of a parallelized differential evolution (DE) algorithm (21). PyBioNetFit-enabled fitting runs were executed within an institutional high-performance computing environment, on the Monsoon cluster at Northern Arizona University. We used 25 of 28 CPUs on a single node within the cluster and the average wall-clock time for optimization simulations were approximately 2 minutes and conducted 2,825 objective function evaluations. The model of CPUs used are Intel(R) Xeon(R) CPU E5–2680 v4 CPUs. Multiple runs were performed, with each run starting at a randomly chosen point within the feasible region of parameter space . The CONF file includes algorithmic parameter settings (e.g., population size) and maps EXP and PROP files to model variants.

Profile likelihood analysis (26) was performed as described by Mitra et al. (19). Computation of a profile involves repeatedly solving the parameterization problem described above, but with one selected parameter in held fixed at a specified value, which is varied across the profile. This analysis is available in the Supplementary Material as Supplementary Figure 4. We found that 14 of the 28 parameters selected for maximum likelihood estimation were identifiable, while the remaining parameters showed little sensitivity to perturbation from their original values.

When included in maximum likelihood estimation (MLE), unidentifiable parameters can reduce the identifiability of sensitive parameters if they are not fixed during parameterization. To assess whether estimating all 28 parameters affected PyBioNetFit’s parameterization of the MEK isoform models, we performed two fits. In the first, MLEs were estimated for all 28 parameters. In the second, only the 14 identifiable parameters were estimated, while the remaining were fixed to the original values from Kocieniewski and Lipniacki (23). Using PyBioNetFit’s differential evolution algorithm and sum of squares objective function, we found that fixing the non-sensitive parameters did not significantly alter the overall objective score or the estimates of identifiable parameters.

2.4 Bayesian inference and uncertainty quantification

In Bayesian inference and uncertainty quantification, for simplicity, we considered only two adjustable model parameters: , a rate constant characterizing degradation of ligand-bound epidermal growth factor receptor (EGFR) dimers, and , a rate constant characterizing phosphatase activity responsible for reversal of ERK-mediated negative-feedback phosphorylation of SOS1. Both parameters were deemed practically identifiable by profile likelihood. The adjustable parameters also included three scaling factors, as in fitting. Finally, the adjustable parameters included a noise model parameter, . In Bayesian inference, we used a likelihood function described by Mitra and Hlavacek (20), which has as a hyperparameter, and a proper uniform prior defined by box constraints on each of the six adjustable parameters (see below).

Bayesian inference was enabled by the adaptive Markov chain Monte Carlo (MCMC) sampler implemented in PyBioNetFit (28), version 1.1.9. The sampling problem setup and workflow were defined with a PyBioNetFit configuration, or CONF file. This file (a plain-text file with a.conf filename extension) is freely available online (https://github.com/lanl/PyBNF/tree/master/examples/Miller2025_MEK_Isoforms/MEK_isoform_aMCMC). The CONF file specifies the box constraints that define the prior. Each sampling job included 25,000 burn-in iterations, 25,000 adaptation iterations (used to tune the covariance matrix of the proposal kernel), and 250,000 production iterations. We generated five independent chains. Each chain was initialized at a random point within the prior. All five chains converged to the same posterior distribution. Convergence was evaluated by inspecting diagnostic trace plots and pairs plots. We also calculated convergence metrics (32) using the rstan R package (Table 4). These metrics indicated convergence according to the guidance of Vehtari et al. (32). PyBioNetFit-enabled sampling jobs were executed within an institutional high-performance computing environment, on the Monsoon cluster at Northern Arizona University. In the Monsoon cluster at Northern Arizona University, we used 25 CPUs of a single 28 CPU node containing Intel(R) Xeon(R) CPU E5–2680 v4 CPUs, and the approximate wall-clock time for our Bayesian inference simulations were 5.15 days, having conducted 1.25 million objective function evaluations.

2.5 Derivation of likelihood function

In this study, we consider a set of model variants and a dataset . The dataset consists of quantitative observations, , and m qualitative observations, . As a simplification, we assume that the observations are independent.

The quantitative observations are relative intensity measurements, which have been scaled such that for . There are no replicate measurements, and each corresponds to a unique measurement condition, . For each experimental readout , there is a corresponding model output , which depends on condition and adjustable parameters, . The model output is generated by the model variant in matched to condition , which is always the WT model (because time-series data are not available for any of the mutant cells).

Each qualitative observation () is the binary outcome of a comparison of two semi-quantitative measurements and made for two different cell lines (e.g., parental and mutant cells) or at two different time points within the same cellular background. By convention, we take to indicate and to indicate . The measurements and are made at conditions and , which are identical except for the difference in cell type or time. For each pair of measurements , there is a pair of model outputs . The output is generated by the model variant in matched to condition , which encompasses cell type or time, and similarly, the output is generated by the model variant matched to . Moreover, for each , we have a corresponding model prediction , where is the Heaviside function.

In maximum likelihood estimation, we want to find , where the likelihood is the probability density over the data given model structure and parameter settings. In inference, we view as a function of the adjustable parameters and take to express how probable the data are for given parameter settings. If observations are independent, decomposes into the product , where is the likelihood of the quantitative data and is the likelihood of the qualitative data.

Let us use to denote a continuous random variable representing the distribution of possible quantitative measurement outcomes of an experiment performed at condition . We take the observation to be a realization of , and we take the model output to be a prediction of the expected measurement outcome at condition . Formally, we take to model . Let us assume normally distributed measurement noise: . It then follows that.

If the quantitative data are taken to be independent, we find.

We can identify as . In the absence of information about variances, let us assume that for . Under this assumption of homoscedasticity (i.e., equal variances), we find (Equations 4, 6–10).

Furthermore,

Note that is maximized if we minimize . The expression for is equivalent to Equation 2 in Mitra et al. (19).

Let us use , to denote a discrete random variable representing the distribution of possible outcomes from a comparison of two semi-quantitative experimental readouts and made at conditions and . There are only two possible outcomes, which are mutually exclusive: (indicating that ) or (indicating that ). We take to be a realization of . We will assume the following noise model: . Equivalently,

Note that is the expectation . Recall that the model prediction of is . As in standard logistic regression, we will assume that the parameter is a sigmoid function of the difference :

where is a scale parameter. With this approach, we are assuming that the difference explains the probability . If , then lies between 0.5 and 1 and is more likely than . Conversely, if , then lies between 0 and 0.5 and is less likely than . From the above considerations, we find.

If the data are independent, we find.

We can identify as . After simplifications, we find.

Furthermore, under an assumption that for , we find.

where . The th term in the sum can be viewed as a static penalty function with weight . The penalty is if the explanatory variable is inconsistent with the observation and 0 otherwise. In the absence of information indicating how categorizations vary over a range of values for the explanatory variable, the weights can be set heuristically as described by Mitra et al. (19). The above expression for is equivalent to Equation 3 in Mitra et al. (19).

In Equation 1, is equal to , where is given by Equation 5 and is given by Equation 11.

3 Results

Kocieniewski and Lipniacki (23) developed a collection of related ordinary differential equation (ODE) models for ERK activation dynamics. These models capture and explain MEK isoform-specific effects in cell lines in which MEK1 is normally expressed (WT), knocked out (KO), and mutated at single amino-acid residues (N78G, T292A, and T292D). The models reproduce experimentally characterized signaling behavior, including various qualitative system properties; however, their parameter values were determined through an ad hoc manual trial-and-error procedure, precluding straightforward reuse of the qualitative data (up/down assays relative to a reference), reproducibility of the model parameterization approach, and any formal assessment of parametric and prediction uncertainty.

To demonstrate a better approach to model parameterization, we formalized qualitative system properties as Biological Property Specification Language (BPSL) statements (21) (Supplementary Tables 1–5). Then, following the inference approach of Mitra et al. (19), which is elaborated above, we applied, in a single global optimization, a parallelized metaheuristic optimization method implemented in PyBioNetFit (21) to find maximum likelihood estimates (MLEs) for 28 model parameters and 3 scaling factors that relate model outputs to relative measurements (Table 3). The quality of fit to quantitative time-series data (relative measurements) from Kamioka et al. (25) is illustrated in Figure 1. The quality of fit to qualitative data—up/down assays relative to a reference—from Catalanotti et al. (24) is illustrated in Figures 2, 3.

Figure 1

Figure 2

Figure 3

In Figure 1, the left panel (Figure 1A) shows the original fit of Kocieniewski and Lipniacki (23) to time-series data. The right panel (Figure 1B) shows the comparable fit found in this study. Our automated parameterization achieves comparable fits across all time courses, with slightly improved agreement for SOS1 and ERK phosphorylation. This improvement is supported quantitatively by the RMSE values reported in Supplementary Tables 8, 9, which show overall lower errors for our MLE parameter estimates relative to the original values from Kocieniewski and Lipniacki (23). Note that the model outputs underlying the relative quantities plotted in Figure 1 are the absolute cellular abundances of phosphorylated EGFR, SOS1, and ERK. These quantities are each multiplied by an adjustable scaling factor to align with the relative measurements reported by Kamioka et al. (25). Finally, objective function score outputs from PyBioNetFit’s sum of squares (sos) objective function are available in Supplementary Table 7. PyBioNetFit’s (Figure 1B) final objective function score during optimization is 24.0, where Kocieniewski and Lipniacki’s (23) final objective function score is 40.0.

The curves in Figures 2, 3 show calibrated model-predicted phosphorylated MEK and ERK, respectively, as a function of time in different cell lines, as indicated by color and pattern. At the time points corresponding to dotted vertical lines, readouts in pairs of distinct cell lines were compared, resulting in up or down scoring (24). The empirical outcomes of these comparisons are indicated graphically above the dotted vertical lines. As in Figure 1, the panels at left (panels A, C, E, and G in both Figures 2, 3) show the simulation results of Kocieniewski and Lipniacki (23), and the panels at right (panels B, D, F, and H in both Figures 2, 3) show the simulation results of this study. As can be seen, we obtained comparable consistency with empirical up/down scoring. Indeed, the calibrated model-predicted time-courses from this study and the earlier study are remarkably similar, despite differences in parameter estimates. To further show that PyBioNetFit’s parameterization of the MEK isoform models maintained comparable consistency to Kocieniewski and Lipniacki’s (23) parameterization of the models, both the original parameterization and PyBioNetFit’s parameterization satisfied 84/90 BPSL constraints determined from distinct cell line readouts from Catalanotti et al. (24).

We note that Figures 2, 3 only consider comparisons of readouts in different cell lines. Additional qualitative data were considered in fitting. These data were generated from comparisons of readouts at different time points within the same cellular background. Supplementary Tables 1–5 provide a full listing of the qualitative observations, formalized as BPSL statements, used in fitting.

Given the discrepancy between the parameter estimates found here and those of the original study, questions of parametric uncertainty naturally arise. To illustrate that our approach allows for uncertainty quantification (UQ), we generated profile likelihood plots for two rate constants in the models, and (Figure 4). These plots show that both parameters are practically identifiable and that estimation of the value for is more constrained by the available data than is estimation of the value for . In other words, the width of any horizontal line segment interior to the profile of Figure 4A (solid blue curve) is shorter than the corresponding line segment in Figure 4B. Note that the dotted red vertical line in each panel marks the best-fit estimate.

Figure 4

To further demonstrate the capabilities of PyBioNetFit, we executed a Bayesian approach to inference and UQ, which leveraged the same datasets as those considered in maximum likelihood estimation. However, for the sake of simplicity, we focused on a smaller set of adjustable parameters. Through Markov chain Monte Carlo (MCMC) sampling, we were able to obtain samples converging on and representing the parameter posterior distribution. Sampling convergence metrics are reported in Table 4. Additional convergence diagnostics, likelihood and parameter trace plots and pairs plots, are shown in Supplementary Figures 1–3. Marginal posterior distributions for the rate constants and are shown in Figure 5. By propagating parametric uncertainty through simulations, we obtained posterior predictive distributions characterizing uncertainty in model predictions (Figures 6, 7). Supplementary Table 6 quantifies consistency with qualitative observations.

Table 4

Parameter

Sampling convergence metrics.

For the rate constants and , the table reports the bulk and tail Effective Sample Size (ESS), as well as the quantity , in which is a convergence diagnostic derived from the potential scale reduction factor and computed using the rstan R package. evaluates the effective number of independent samples for the central bulk of the posterior distribution, whereas reflects the stability of sampling in the distribution tails. High ESS values across both metrics indicate efficient sampling with low autocorrelation. The values are close to zero, suggesting convergence of the chains and well-mixed posterior estimates.

Figure 5

Figure 6

Figure 7

4 Discussion

In this study, we demonstrated that PyBioNetFit (21) can replace manual trial-and-error parameter tuning, as in the study of Kocieniewski and Lipniacki (23), with a reproducible, automated pipeline that integrates both quantitative and qualitative data. We showed how to formalize up/down results of case-control comparisons as Biological Property Specification Language (BPSL) statements, and we showed how to leverage BPSL statements in concert with conventional time-series data in not only model parameterization but also rigorous uncertainty quantification (UQ). We elaborated on the approach of Mitra et al. (19), showing that this approach is grounded in likelihood-based inference. We also demonstrated both frequentist and Bayesian approaches to rigorous uncertainty quantification (UQ), which was missing in the original modeling study of Kocieniewski and Lipniacki (23).

An important aspect of our re-parameterization of Kocieniewski and Lipniacki’s (23) MEK isoform models is that different parameter sets can yield comparably good fits to the available data. This phenomenon arises when multiple combinations of parameters produce near-optimal objective function values. Lack of organization in models like these can be closely related to practical non-identifiability, in which parameter estimates cannot be uniquely determined given the data. In our profile likelihood analysis of all 28 fitted parameters, we found that approximately half were practically identifiable, while the remainder showed broad, flat profiles, indicating limited sensitivity to perturbation. The full results are shown in Supplementary Figure 4, which plots each parameter against its objective function profile, maximum likelihood estimate, and deviation from the original baseline from Kocieniewski and Lipniacki (23). Although this lack of uniqueness highlights the need for additional experimental data to fully constrain the system, it is important to note that model predictions remained consistent across different parameter sets. Thus, despite non-identifiable parameters, the model exhibits robustness, supporting the reliability of the conclusions drawn from our PyBioNetFit-enabled analysis.

Our results illustrate several advantages of a PyBioNetFit-enabled workflow, including potential for straightforward reusability of qualitative observations, once formalized as BPSL statements, and reproducibility of complex workflows. PyBioNetFit job setup files consist of easily shared plain-text files capturing data, models, and algorithms in standardized formats. Our results also reinforce the value of qualitative data. These data can be used in model parameterization with UQ, in multiple ways. Looking ahead, this study serves as a template for enhancing model parameterization pipelines in contexts where qualitative data are available to complement (limited) quantitative data.

4.1 Limitations and future directions

While our study demonstrates the utility of PyBioNetFit for automated parameterization with both qualitative and quantitative data, several limitations should be acknowledged. First, scalability remains a practical challenge. Although PyBioNetFit performs well for models of moderate size, applying it to models with hundreds or thousands of parameters, especially those that are high-dimensional, will become too computationally expensive. Future work may benefit from hybrid optimization strategies that combine global search with local refinement to improve efficiency.

Secondly, while the Biological Property Specification Language (BPSL) is well suited to encoding rank orderings, inequalities at fixed times, and cross-condition comparisons, it does not currently support constraints on more complex temporal features, such as peak times. Extending BPSL to represent such properties, as well as improving its ability to incorporate metadata (e.g., experimental conditions, cell types), would expand its applicability.

Third, BPSL has potential beyond parameter estimation. For example, BPSL statements could be leveraged in model-checking workflows to systematically document a model’s consistency with qualitative biological knowledge. In addition, advances in natural language processing, particularly large language models, could enable automated extraction of BPSL statements from full-text articles, further broadening their utility in parameterization and uncertainty quantification workflows.

Taken together, these considerations highlight important directions for future research. Our demonstration emphasizes that while current approaches are powerful, there is significant scope for enhancing both scalability and expressivity, thereby enabling broader adoption of hybrid data workflows in systems biology.

Statements

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.

Author contributions

EM: Writing – review & editing, Visualization, Formal analysis, Writing – original draft, Validation, Software, Investigation. AM: Writing – review & editing, Formal analysis, Writing – original draft. JN: Methodology, Writing – review & editing, Writing – original draft, Software. YL: Software, Formal analysis, Supervision, Writing – review & editing, Methodology, Writing – original draft, Investigation. WH: Resources, Formal analysis, Project administration, Methodology, Writing – review & editing, Software, Writing – original draft, Funding acquisition, Conceptualization. RP: Writing – review & editing, Project administration, Supervision, Software, Writing – original draft, Conceptualization, Funding acquisition, Resources, Formal analysis.

Funding

The author(s) declared that financial support was received for this work and/or its publication. We acknowledge support from the National Institute of General Medical Sciences of the National Institutes of Health through grant R01GM111510, and support from the National Institute of Allergy and Infectious Diseases of the National Institutes of Health through grant R01AI153617. We also received support from the Center for Nonlinear Studies and the Laboratory-Directed Research and Development Program at Los Alamos National Laboratory. This study benefitted from Northern Arizona University’s Monsoon computer cluster, which is supported by Arizona’s Technology and Research Initiative Fund, and Los Alamos National Laboratory’s Darwin computer cluster, which is supported by the Computational Systems and Software Environment subprogram of the Advanced Simulation and Computing program at Los Alamos National Laboratory.

Acknowledgments

We thank Dr. T. Lipniacki for helpful discussions.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer IM declared a shared affiliation with the author JN to the handling editor at time of review.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fimmu.2026.1663008/full#supplementary-material

References

1
MitraEDHlavacekWS. Parameter estimation and uncertainty quantification for systems biology models. Curr Opin Syst Biol. (2019) 18:9–18. doi: 10.1016/j.coisb.2019.10.006. PMID:
2
ChenKCCsikasz-NagyAGyorffyBValJNovakBTysonJJ. Kinetic analysis of a molecular model of the budding yeast cell cycle. Mol Biol Cell. (2000) 11:369–91. doi: 10.1091/mbc.11.1.369. PMID:
3
ChenKCCalzoneLCsikasz-NagyACrossFRNovakBTysonJJ. Integrative analysis of cell cycle control in budding yeast. Mol Biol Cell. (2004) 15:3841–62. doi: 10.1091/mbc.e03-11-0794. PMID:
4
KraikivskiPChenKCLaomettachitTMuraliTMTysonJJ. From START to FINISH: computational analysis of cell cycle control in budding yeast. NPJ Syst Biol Appl. (2015) 1:1–9. doi: 10.1038/npjsba.2015.16. PMID:
5
BarikDBallDAPeccoudJTysonJJ. A stochastic model of the yeast cell cycle reveals roles for feedback regulation in limiting cellular variability. PloS Comput Biol. (2016) 12:e1005230. doi: 10.1371/journal.pcbi.1005230. PMID:
6
TysonJJLaomettachitTKraikivskiP. Modeling the dynamic behavior of biochemical regulatory networks. J Theor Biol. (2019) 462:514–27. doi: 10.1016/j.jtbi.2018.11.034. PMID:
7
KinneyJBTkacikGCallanCG. Precise physical models of protein-DNA interaction from high-throughput data. Proc Natl Acad Sci USA. (2007) 104:501–6. doi: 10.1073/pnas.0609908104. PMID:
8
KinneyJBMuruganAGallanCGCoxEC. Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence. Proc Natl Acad Sci USA. (2010) 107:9158–63. doi: 10.1073/pnas.1004290107. PMID:
9
AtwalGSKinneyJB. Learning quantitative sequence–function relationships from massively parallel experiments. J Stat Phys. (2016) 162:1203–43. doi: 10.1007/s10955-015-1398-3. PMID:
10
KinneyJBMcCandlishDM. Massively parallel assays and quantitative sequence-function relationships. Annu Rev Genomics Hum Genet. (2019) 20:99–127. doi: 10.1146/annurev-genom-083118-014845. PMID:
11
ToniTJovanovicGHuvetMBuckMStumpfMPH. From qualitative data to quantitative models: analysis of the phage shock protein stress response in Escherichia coli. BMC Syst Biol. (2011) 5:69. doi: 10.1186/1752-0509-5-69. PMID:
12
PargettMUmulisDM. Quantitative model analysis with diverse biological data: applications in developmental pattern formation. Methods. (2013) 62:56–67. doi: 10.1016/j.ymeth.2013.03.024. PMID:
13
PargettMRundellAEBuzzardGTUmulisDM. Model-based analysis for qualitative data: an application in Drosophila germline stem cell regulation. PloS Comput Biol. (2014) 10:e1003498. doi: 10.1371/journal.pcbi.1003498. PMID:
14
SchmiesterLWeindlDHasenauerJ. Parameterization of mechanistic models from qualitative data using an efficient optimal scaling approach. J Math. Biol. (2020) 81:603–23. doi: 10.1007/s00285-020-01522-w. PMID:
15
SchmiesterLWeindlDHasenauerJ. Efficient gradient-based parameter estimation for dynamic models using qualitative data. Bioinformatics. (2021) 37:4493–500. doi: 10.1093/bioinformatics/btab512. PMID:
16
DorešićDGreinSHasenauerJ. Efficient parameter estimation for ODE models of cellular processes using semi-quantitative data. Bioinformatics. (2024) 40:i558–66:btae210. doi: 10.1093/bioinformatics/btae210
17
TareenAKooshkbaghiMPosfaiAIrelandWTMcCandlishDMKinneyJB. MAVE-NN: learning genotype-phenotype maps from multiplex assays of variant effect. Genome Biol. (2022) 23:98. doi: 10.1186/s13059-022-02661-7
18
OguzCLaomettachitTChenKCWatsonLTBaumannWTTysonJJ. Optimization and model reduction in the high dimensional parameter space of a budding yeast cell cycle model. BMC Syst Biol. (2013) 7:1–17. doi: 10.1186/1752-0509-7-53. PMID:
19
MitraEDDiasRPosnerRGHlavacekWS. Using both qualitative and quantitative data in parameter identification for systems biology models. Nat Commun. (2018) 9:3901. doi: 10.1038/s41467-018-06439-z. PMID:
20
MitraEDHlavacekWS. Bayesian inference using qualitative observations of underlying continuous variables. Bioinformatics. (2020) 36:3177–84. doi: 10.1093/bioinformatics/btaa084. PMID:
21
MitraEDSudermanRColvinJIonkovAHuASauroHMet al. PyBioNetFit and the Biological Property Specification Language. iScience. (2019) 19:1012–36. doi: 10.2139/ssrn.3382545. PMID:
22
SchälteYFröhlichFJostPJVanhoeferJPathiranaDStaporPet al. pyPESTO: A modular and scalable tool for parameter estimation for dynamic models. Bioinformatics. (2023) 39:btad711. doi: 10.1093/bioinformatics/btad711
23
KocieniewskiPLipniackiT. MEK1 and MEK2 differentially control the duration and amplitude of the ERK cascade response. Phys Biol. (2013) 10:35006. doi: 10.1088/1478-3975/10/3/035006. PMID:
24
CatalanottiFReyesGJesenbergerVGalabova-KovacsGde Matos SimoesRCarugoOet al. A Mek1–Mek2 heterodimer determines the strength and duration of the Erk signal. Nat Struct Mol Biol. (2009) 16:294–303. doi: 10.1038/nsmb.1564. PMID:
25
KamiokaYYasudaSFujitaYAokiKMatsudaM. Multiple decisive phosphorylation sites for the negative feedback regulation of SOS1 via ERK. J Biol Chem. (2010) 285:33540–8. doi: 10.1074/jbc.m110.135517. PMID:
26
KreutzCRaueAKaschekDTimmerJ. Profile likelihood in systems biology. FEBS J. (2013) 280:2564–71. doi: 10.1111/febs.12276. PMID:
27
AndrieuCThomsJ. A tutorial on adaptive MCMC. Stat Comp. (2008) 18:343–73. doi: 10.1007/s11222-008-9110-y. PMID:
28
NeumannJLinYTMallelaAMillerEFColvinJDupratATet al. Implementation of a practical Markov chain Monte Carlo sampling algorithm in PyBioNetFit. Bioinformatics. (2022) 38:1770–2. doi: 10.1093/bioinformatics/btac004. PMID:
29
FaederJRBlinovMLHlavacekWS. Rule-based modeling of biochemical systems with BioNetGen. Methods Mol Biol. (2009) 500:113–67. doi: 10.1007/978-1-59745-525-1_5. PMID:
30
KeatingSMWaltemathDKönigMZhangFDrägerAChaouiyaCet al. SBML Level 3: an extensible format for the exchange and reuse of biological models. Mol Syst Biol. (2020) 16:e9110. doi: 10.15252/msb.20199110. PMID:
31
HarrisLAHoggJSTapiaJJSekarJAPGuptaSKorsunskyIet al. BioNetGen 2.2: advances in rule-based modeling. Bioinformatics. (2016) 32:3366–8. doi: 10.1093/bioinformatics/btw469. PMID:
32
VehtariAGelmanASimpsonDCarpenterBBürknerPC. Rank-normalization, folding, and localization: an improved <math><mover accent=‘true’><mi>R</mi><mo>^</mo></mover></math> for assessing convergence of MCMC (with discussion). Bayesian. Anal. (2021) 16:667–718. doi: 10.1214/20-ba1221. PMID:

Summary

Keywords

bayesian inference, curve-fitting, Markov chain Monte Carlo (MCMC), maximum likelihood estimation (MLE), profile likelihood

Citation

Miller EF, Mallela A, Neumann J, Lin YT, Hlavacek WS and Posner RG (2026) Using PyBioNetFit to leverage qualitative and quantitative data in biological model parameterization and uncertainty quantification. Front. Immunol. 17:1663008. doi: 10.3389/fimmu.2026.1663008

Received

11 July 2025

Revised

06 April 2026

Accepted

06 April 2026

Published

29 April 2026

Volume

17 - 2026

Edited by

Padmini Rangamani, University of California, San Diego, United States

Reviewed by

Isha Monga, Weill Cornell medicine, United States

Marissa Renardy, GlaxoSmithKline (United States), United States

Małgorzata Kardyńska, Silesian University of Technology, Poland

Updates

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: William S. Hlavacek, hlavacek@lanl.gov; Richard G. Posner, richard.posner@nau.edu

†These authors have contributed equally to this work

‡ Present address: Abhishek Mallela, Dartmouth College, Hanover, NH, United States of America

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Systems Immunology

ORIGINAL RESEARCH article

Using PyBioNetFit to leverage qualitative and quantitative data in biological model parameterization and uncertainty quantification

Abstract

1 Introduction

2 Materials and methods