Characterization and Valuation of the Uncertainty of Calibrated Parameters in Microsimulation Decision Models

Alarid-Escudero, Fernando; Knudsen, Amy B.; Ozik, Jonathan; Collier, Nicholson; Kuntz, Karen M.

doi:10.3389/fphys.2022.780917

ORIGINAL RESEARCH article

Front. Physiol., 09 May 2022

Sec. Computational Physiology and Medicine

Volume 13 - 2022 | https://doi.org/10.3389/fphys.2022.780917

This article is part of the Research TopicIntegration of Machine Learning and Computer Simulation in Solving Complex Physiological and Medical QuestionsView all 14 articles

Characterization and Valuation of the Uncertainty of Calibrated Parameters in Microsimulation Decision Models

Fernando Alarid-Escudero¹*

Amy B. Knudsen²

Jonathan Ozik^3,4

Nicholson Collier^3,4

Karen M. Kuntz⁵

¹Division of Public Administration, Center for Research and Teaching in Economics (CIDE), Aguascalientes, Mexico
²Institute for Technology Assessment, Massachusetts General Hospital, Boston, MA, United States
³Decision and Infrastructure Sciences Division, Argonne National Laboratory, Argonne, IL, United States
⁴Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, United States
⁵Division of Health Policy and Management, University of Minnesota School of Public Health, Minneapolis, MN, United States

Background: We evaluated the implications of different approaches to characterize the uncertainty of calibrated parameters of microsimulation decision models (DMs) and quantified the value of such uncertainty in decision making.

Methods: We calibrated the natural history model of CRC to simulated epidemiological data with different degrees of uncertainty and obtained the joint posterior distribution of the parameters using a Bayesian approach. We conducted a probabilistic sensitivity analysis (PSA) on all the model parameters with different characterizations of the uncertainty of the calibrated parameters. We estimated the value of uncertainty of the various characterizations with a value of information analysis. We conducted all analyses using high-performance computing resources running the Extreme-scale Model Exploration with Swift (EMEWS) framework.

Results: The posterior distribution had a high correlation among some parameters. The parameters of the Weibull hazard function for the age of onset of adenomas had the highest posterior correlation of −0.958. When comparing full posterior distributions and the maximum-a-posteriori estimate of the calibrated parameters, there is little difference in the spread of the distribution of the CEA outcomes with a similar expected value of perfect information (EVPI) of $653 and $685, respectively, at a willingness-to-pay (WTP) threshold of $66,000 per quality-adjusted life year (QALY). Ignoring correlation on the calibrated parameters’ posterior distribution produced the broadest distribution of CEA outcomes and the highest EVPI of $809 at the same WTP threshold.

Conclusion: Different characterizations of the uncertainty of calibrated parameters affect the expected value of eliminating parametric uncertainty on the CEA. Ignoring inherent correlation among calibrated parameters on a PSA overestimates the value of uncertainty.

Background

Decision models (DMs) are commonly used in cost-effectiveness analysis where uncertainty in the parameters is inherent (Kuntz et al., 2017). The impact of parameter uncertainty can be assessed with a probabilistic sensitivity analysis (PSA) to characterize decision uncertainty (i.e., the probability of a strategy being cost-effective) (Briggs et al., 2012; Sculpher et al., 2017) and to quantify the value of potential future research by determining the potential consequences of a decision with value of information (VOI) analysis (Schlaifer, 1959; Raiffa and Schlaifer, 1961).

The parameters of DMs can be split into two categories, those obtained from the literature or estimated from available data (i.e., external parameters) and those that need to be estimated through calibration (i.e., calibrated parameters). External parameters are estimated either from individual-level or aggregated data that directly inform the parameters of interest. There are recommendations on the type of distributions that characterize their uncertainty based on the characteristics of the parameters or the statistical model used to estimate them (Briggs et al., 2012). For example, a probability could be modeled with a beta distribution and a relative risk with a lognormal distribution (Briggs et al., 2002). For calibrated parameters, no such data exist that can directly inform their uncertainty because a research study hasn’t been conducted or is unfeasible to conduct, or because the parameters reflect unobservable phenomena, as is often the case in natural history models of chronic diseases (Welton and Ades, 2005; Karnon et al., 2007; Rutter et al., 2009; Rutter et al., 2011) or in infectious disease dynamic models (Enns et al., 2017). The choice of distribution for these parameters is often less clear. One option is to define uniform distributions with wide bounds or generate informed distributions based on moments of the calibrated parameters, such as the mean and standard error. However, the impact of these approaches to characterize the uncertainty of calibrated parameters on decision uncertainty and the VOI on reducing that uncertainty has not been studied.

Model calibration is the process of estimating unobserved or unobservable parameters by matching model outputs to observed clinical or epidemiological data (known as calibration targets) (Kennedy and O’Hagan, 2001; Stout et al., 2009; Kuntz et al., 2017). While there are several approaches for searching the parameter space in the calibration process, most approaches are insufficient to characterize the uncertainty in the calibrated model parameters because they do not provide interval estimates. For example, direct-search optimization algorithms like Newton-Raphson Nelder-Mead (Nelder and Mead 1965) simulated annealing or genetic algorithms (Kong et al., 2009) treat the calibration targets as if they were known with certainty, so are primarily useful when identifying a single or a set of parameters that yield good fit to the targets (Kennedy and O’Hagan, 2001).

A sample of calibrated parameter sets that correctly characterizes the uncertainty of the calibration target data is obtained from their joint distribution, conditional on the calibrated targets. To obtain the joint distribution, calibration could be specified as a statistical estimation problem under at least two different frameworks, through maximum likelihood (ML) or Bayesian methods. ML can fail in obtaining interval estimates by not being able to estimate the Hessian matrix when the likelihood is intractable or computationally intensive to simulate and when the calibration problem is non-identifiable (Gustafson, 2005; Alarid-Escudero et al., 2018); thus, we focus on Bayesian methods (Romanowicz et al., 1994; Kennedy and O’Hagan, 2001; Oakley and O’Hagan, 2004; Gustafson, 2005; Kaipio and Somersalo, 2005; Oden et al., 2010; Gustafson, 2015; Alarid-Escudero et al., 2018).

Despite their suitability to correctly characterize the uncertainty of calibrated model parameters, Bayesian methods are generally computationally expensive because they require evaluating the model thousands and sometimes millions of times. The computational burden of Bayesian methods does not seem to be an impediment when calibrating non-computationally intensive DMs (e.g., Markov cohort models, difference equations, relatively small systems of differential equations, etc.) (Whyte et al., 2011; Hawkins-Daarud et al., 2013; Jackson et al., 2016; Menzies et al., 2017). Still, they become more challenging to apply to DMs that could be computationally intensive to solve, such as models that simulate underlying stochastic processes (Iskandar, 2018) (e.g., microsimulation, discrete-event simulation, and agent-based models), limiting their use to only a few of such models (Rutter et al., 2009).

However, the increasing availability of high-performance computing (HPC) systems in an academic, national laboratory and commercial settings enables such systems for model calibration and model exploration of microsimulation DMs at a large scale to a broader audience. HPC resources allow running large numbers of DMs concurrently, allowing calibration algorithms to generate large batches of parameters simultaneously, such as the incremental mixture importance sampling (IMIS) described below, to be run efficiently. In many cases, particularly in the academic and national laboratory settings, computing allocations can be obtained through proposals with no cost to researchers (e.g., the Advanced Scientific Computing Research (ASCR) Leadership Computing Challenge (ALCC), https://science.osti.gov/ascr/Facilities/Accessing-ASCR-Facilities/ALCC). However, implementing dynamic calibration algorithms for HPC resources has generally proved difficult, requiring specialized knowledge across various disciplines. The Extreme-scale Model Exploration with Swift (EMEWS) framework was designed to facilitate large-scale model calibration and exploration on HPC resources (Ozik et al., 2016a) to a broad community. EMEWS can run very large, highly concurrent ensembles of microsimulation DMs of varying types with a broad class of calibration algorithms, including those increasingly available to the community via. Python and R libraries, using HPC workflows. EMEWS workflows provide interfaces for plugging in DMs (and any other simulation or black box model) and algorithms, through an inversion of control scheme (Ozik et al., 2018), to control the dynamic execution of those DMs for calibration and other heuristics for “model exploration” purposes. These interfaces help reduce the need for an in-depth understanding of how task coordination and inter-task dependencies are implemented for HPC resources. The general use of EMEWS can be seen on the EMEWS website (https://emews.github.io), which includes links to tutorials.

The purpose of our study is threefold. First, to use recently developed HPC capabilities to characterize the uncertainty of calibrated parameters of a microsimulation model of the natural history of colorectal cancer (CRC). Second, to explore the impact of different approaches to characterize the uncertainty of calibrated parameters on decision uncertainty, and third, to use VOI analysis to quantify the value of eliminating parameter uncertainty when assessing the cost-effectiveness of CRC screening.

Methods

We developed a microsimulation model of the natural history of CRC and calibrated it using a Bayesian approach. We then overlaid a simple CRC screening strategy onto the natural history model and conducted a cost-effectiveness analysis (CEA) of screening, including a PSA. Instead of using the posterior means to represent the best estimates of each calibrated parameter, we obtained the posterior distribution using a Bayesian approach that represents the joint uncertainty of all the calibrated parameters that can then be used in a PSA. We then evaluated the impact of different approaches to characterize the uncertainty of calibrated parameters on the joint distribution of incremental costs and incremental effects of the screening strategy compared with no screening through a PSA while also accounting for the uncertainty of the external parameters (e.g., test characteristics, costs, etc.). Finally, we quantified the amount of money that a decision maker should be willing to spend to eliminate all parameter uncertainty (i.e., the expected value of perfect information (EVPI)).

Microsimulation Model of the Natural History of CRC

We developed a state-transition microsimulation model of the natural history of CRC implemented in R (Krijkamp et al., 2018) based on a previously developed model (Alarid-Escudero et al., 2018). The progression between health states follows a continuous-time age-dependent Markov process. There are two age-dependent transition intensities (i.e., transition rates), $λ_{1} (a)$ and $μ (a)$ , that govern the age of onset of adenomas and non-cancer-specific mortality, respectively. Following Wu et al. (2006) we specify $λ_{1} (a)$ as a Weibull hazard with the following specification

λ_{1} (a) = l γ a^{γ - 1},

where $a$ is the age of the simulated individuals, and $l$ and $γ$ are the scale and shape parameters of the Weibull hazard function, respectively. The model simulates two adenoma categories: small (adenoma smaller than 1 cm in size) and large (adenoma larger than or equal to 1 cm in size). All adenomas start small and can transition to the large size category at a constant annual rate $λ_{2}$ . Large adenomas may become preclinical CRC at a constant annual rate $λ_{3}$ . Both small and large adenomas may progress to preclinical CRC, although most will not in a simulated individual’s lifetime. Early preclinical cancers (preclinical stages I and II) progress to late stages (preclinical stages III and IV) at a constant annual rate $λ_{4}$ and could become symptomatic at a constant annual rate $λ_{5}$ . Late preclinical cancer could become symptomatic at a constant annual rate $λ_{6}$ . After clinical detection, the model simulates the survival time to early and late CRC death using cancer-specific constant mortality rates, $λ_{7}$ and $λ_{8}$ , respectively. The model has nine health states: normal, small adenoma, large adenoma, early preclinical CRC, late preclinical CRC, early clinical CRC, late clinical CRC, CRC death, and death from other causes. The state-transition diagram of the continuous-time model is shown in Figure 1.

FIGURE 1

FIGURE 1. State-transition diagram of the nine-state microsimulation model of the natural history of colorectal cancer. Individuals in all health states face an age-specific mortality of dying from other causes (state not shown) (Jalal et al., 2021).

The continuous-time age-dependent Markov process of this natural history model of CRC can be represented by an age-dependent $9 \times 9$ transition intensity matrix, $Q (a)$ . To translate $Q (a)$ to discrete-time, we compute the annual-cycle age-dependent transition probability matrix, $P (a, t)$ , using the Kolmogorov differential equations (Kolmogorov, 1963; Cox and Miller, 1965; Welton and Ades, 2005)

P (a, t) = E x p (t Q (a)),

where $t = 1$ and $E x p ()$ is the matrix exponential. In discrete time, the natural history model of CRC allows individual transitions across multiple health states in a single year. Small and large adenomas may progress to preclinical or clinical CRC, and preclinical cancers may progress through early and late stages.

We simulated a hypothetical cohort of 50-year-old women in the United States over a lifetime. The cohort starts the simulation with a prevalence of adenoma of $p_{a d e n o}$ , from which a proportion, $p_{s m a l l}$ , corresponds to small adenomas, and a prevalence of preclinical early and late CRC of 0.12% (Rutter et al., 2007) and 0.08% (Wu et al., 2006), respectively. The parameters $p_{a d e n o}$ and $p_{s m a l l}$ are calibrated parameters. The simulated cohort is at risk of all-cause mortality, $μ (a)$ , from all health states obtained from 2014 United States life tables (Arias et al., 2017).

Calibration Targets

We used the microsimulation model of the natural history of CRC to generate synthetic calibration targets by selecting a set of parameter values based on plausible estimates from the literature (Table 1) (Wu et al., 2006; Rutter et al., 2007). We simulated four different age-specific synthetic targets, including adenoma prevalence, the proportion of small adenomas, and CRC incidence for early and late stages, which resemble commonly used calibration targets for this type of model (Rutter et al., 2009; Whyte et al., 2011; Frazier et al., 2000; Kuntz et al., 2011). To simulate the calibration targets, we ran the microsimulation model 100 times to get a stable estimate of the standard errors (SEs) using the fixed values in Table 1. We then aggregated each outcome across all 100 model replications to compute their mean and SE. To account for different levels of uncertainty across targets given the amount of data to estimate their summary measures, we simulated various targets based on cohorts of different sizes (Rutter et al., 2009). Adenoma-related targets were based on a cohort of 500 individuals, and cancer incidence targets were based on 100,000 individuals.

TABLE 1

TABLE 1. Description of parameters of the natural history model.

Calibration of the Microsimulation Model of the Natural History

To state the calibration of the microsimulation model as an estimation problem (Alarid-Escudero et al., 2018), we define $M$ as the microsimulation model of the natural history of CRC with 11 input parameters. Cancer-specific mortality rates from early and late stages of CRC could be obtained from cancer population registries (e.g., the Surveillance, Epidemiology and End Results (SEER) registry in the United States), so calibration of these rates was unnecessary. That is, $θ_{k} = [λ_{7}, λ_{8}]$ is a set of 2 parameters that are either known or could be obtained from external data (i.e., are external parameters). The model has a set of 9 parameters $θ_{u} = [p_{a d e n o}, p_{s m a l l}, l, γ, λ_{2}, λ_{3}, λ_{4}, λ_{5}, λ_{6}]$ that cannot be directly estimated from sample data and need to be calibrated. $M$ 's full set of parameters is $θ = [θ_{u}, θ_{k}]$ .

To calibrate $M$ , we adopted a Bayesian approach that allowed us to obtain a joint posterior distribution that characterizes the uncertainty of both the calibration targets and previous knowledge of the parameters of interest in the form of prior distributions. Prior distributions can reflect experts’ opinions, or when little knowledge is available, these could be specified as uniform distributions. We constructed the likelihood function by assuming that each type of target $t$ , including adenoma prevalence, proportion of small adenomas, early clinical CRC incidence, and late clinical CRC incidence for each age group $a$ , $y_{t_{a}}$ , are normally distributed with mean $ϕ_{t_{a}}$ and standard deviation $σ_{t_{a}}$ (Alarid-Escudero et al., 2018). That is,

y_{t_{a}} \sim N o r m a l (ϕ_{t_{a}}, σ_{t_{a}}),

where $ϕ_{t_{a}} =E [M (θ)]$ is the expected value of the model-predicted output from parameter set $θ$ . We added the log-likelihoods across all targets to compute an aggregated likelihood measure. We defined prior distributions for all $θ_{u}$ based on previous knowledge or the nature of the parameters (Table 1). We defined beta distributions for the prevalence of adenomas and the proportion of small adenomas at age 50, bounded between 0 and 1. We assumed that the annual transition rates follow a log-normal distribution for their priors, defined over positive numbers. The ranges given in Table 1 are assumed to represent the 95% equal-tailed interval for the beta and log-normal distributions.

To conduct the Bayesian calibration, we used the incremental mixture importance sampling (IMIS) algorithm (Steele et al., 2006; Raftery and Bao, 2009), which has been previously used to calibrate health policy models (Menzies et al., 2017; Ryckman et al., 2020). We ran the IMIS algorithm on the Midway2 cluster at the University of Chicago Research Computing Center (https://rcc.uchicago.edu/resources/high-performance-computing). Midway2 is a hybrid cluster, including both central processing unit (CPU) and graphics processing unit (GPU) resources. For this work, we used the CPU resources. Midway2 consists of 370 nodes of Intel E5-2680v4 processors, each with 28 cores and 64 GB of RAM. Using EMEWS, we developed a workflow that parallelized the likelihood evaluations over 1,008 processes using 36 compute nodes. In other words, we reduced the computation time approximately by 250 had the analysis been conducted in a laptop with four processing cores.

Consistent with previous analyses, we deemed that convergence had occurred when the target effective sample size (ESS) got as close as 5,000 (Rutter et al., 2019; DeYoreo et al., 2022). An advantage of IMIS over other Monte Carlo methods, such as Markov chain Monte Carlo, is that with IMIS, we parallelize the evaluation of the likelihood for different sampled parameter sets, making its implementation perfectly suitable for an HPC environment using EMEWS. IMIS requires defining and computing the likelihood, which we could do with our model. However, when computing the likelihood is intractable, modelers could use the incremental mixture approximate Bayesian computation (IMABC) algorithm (Rutter et al., 2019), which an approximate Bayesian version of IMIS.

Propagation of Uncertainty

We sampled 5,000 parameter sets from the IMIS joint posterior distribution for the nine calibrated model parameters. To compare the outputs of the calibrated model against the calibration targets, we propagated the uncertainty of the calibrated parameters through the microsimulation model of the natural history of CRC. We simulated a cohort of 100,000 (i.e., the largest cohort size used to generate the targets). We generated the model-predicted adenoma and cancer outcomes for each of the 5,000 calibrated parameter sets drawn from their joint posterior distribution. We computed the 95% posterior predicted interval (PI), defined as the estimated range between the 2.5th and 97.5th percentiles of the model-predicted posterior outputs to quantify the uncertainty limit model outputs.

Cost-Effectiveness Analysis of Screening for CRC

With the calibrated microsimulation model of the natural history of CRC, we assessed the cost-effectiveness of 10-yearly colonoscopy screening starting at age 50 years compared to no screening. For adenomas detected with colonoscopy, a polypectomy was performed during the procedure. Individuals diagnosed with a small or large adenoma underwent surveillance with colonoscopy every 5 or 3 years, respectively. We assumed screening or surveillance continued until 85 years of age. Individuals with a history of polyp diagnosis had higher recurrence rates after polypectomy, that is, a higher transition rate from normal to small adenoma (i.e., $λ_{1} (a)$ ). We assumed a hazard ratio of 2 for small adenomas and 3 for the large adenomas. The costs and utilities of CRC care varied by stage, and individuals without clinical CRC had a utility of 1. Table 2 shows the parameters used in the CEA with their corresponding distributions.

TABLE 2

TABLE 2. Description of cost-effectiveness analysis parameters.

Uncertainty Quantification

We performed four different approaches to quantify the uncertainty of the two types of parameters—calibrated parameters and external (i.e., CEA) parameters. The first approach for uncertainty quantification considers uncertainty in both types of parameters, with uncertainty of the calibrated parameters characterized by their joint posterior distribution obtained from the IMIS algorithm. The second approach only considers uncertainty in the external parameters while fixing the calibrated parameters at the maximum-a-posteriori (MAP) estimate, defined as the parameter with the highest posterior density. The third approach considers uncertainty only in the calibrated parameters characterized by their joint posterior distribution and no uncertainty in the external parameters, fixed at their mean values. The fourth approach considers uncertainty in both types of parameters, but instead of using the IMIS posterior distribution of the calibrated parameters, we constructed distributions based solely on the IMIS posterior moments (i.e., means and standard deviations) and the type of calibrated parameters ignoring correlations.

We conducted a PSA to evaluate the impact of uncertainty in model parameters on the cost-effectiveness of 10-years colonoscopy screening vs. no screening for CRC. A separate PSA was performed for the four different approaches to quantify the uncertainty of the two types of parameters. We used EMEWS to distribute the samples of each PSA across HPC resources.

Value of Information Analysis

We quantified the theoretical value of eliminating uncertainty in the external and calibrated model parameters using VOI analysis. VOI measures the losses (i.e., foregone benefits) from choosing a strategy given imperfect information (Raiffa and Schlaifer, 1961), providing the amount of resources a decision maker should be willing to spend to obtain information that would reduce the uncertainty. Specifically, we estimated the value of eliminating parametric uncertainty (i.e., the EVPI) in the cost-effectiveness of a 10-years colonoscopy screening strategy. This entailed computing the difference in net benefit between perfect information and current information (Oostenbrink et al., 2008). The EVPI was calculated across a wide range of willingness-to-pay (WTP) thresholds (Eckermann et al., 2010). We repeated this VOI analysis for the different approaches to characterize the uncertainty of the calibrated and external parameters.

Results

We sampled 5,000 parameter sets from the posterior distribution using IMIS, including 3,241 unique parameter sets with an expected sample size (ESS) of 2,098. With the sample from the posterior distribution, we estimated posterior means and standard deviations, MAP estimates, and 95% credible intervals (CrI) for all calibrated parameters (Table 3). The posterior means of the calibrated parameter were similar to the prior means (Table 3). Still, the major contrast is that the width of the posterior distributions shrunk, meaning that the calibration targets informed the calibrated parameters through a Bayesian updating (Figure 2).

TABLE 3

TABLE 3. Posterior means, standard deviations, maximum-a-posteriori (MAP) estimate and 95% credible interval (CrI) of calibrated parameters of the microsimulation model of the natural history of CRC.

FIGURE 2

FIGURE 2. Prior and posterior marginal distributions of calibrated parameters of the microsimulation model of the natural history of CRC.

The Bayesian calibration also correlated the parameters, showing the dependency among some of them (Figure 3). There are pairs of parameters with high correlation. The scale and shape parameters of the Weibull hazard function for the age of onset of adenomas, $l$ , and $γ$ , respectively, have the highest negative correlation of −0.958. The high correlation results from the calibration of the microsimulation model of the natural history of CRC being non-identifiable when calibrating all 9 parameters to all the targets. The transition rates from early preclinical CRC to late preclinical and early clinical have a correlation of 0.784. The prevalence of adenomas and the proportion of small adenomas at age 50, which inform the initial distribution of the cohort across the adenoma health states, also have a high correlation of 0.482. These high correlations result from the model calibration being non-identifiable. In a previous study, we found that the estimation of the 9 parameters of this model structure is non-identifiable via. calibration because the relationship between the parameters is highly colinear when using the current four calibration targets (Alarid-Escudero et al., 2018).

FIGURE 3

FIGURE 3. Scatter plot of pairs of deep model parameters with correlation coefficient and posterior marginal distributions.

The calibrated model accurately predicted the calibration targets for both the means and the uncertainty intervals. Figure 4 shows the internal validation of the calibrated model by comparing calibration targets with their 95% confidence interval (CI) and the model-predicted posterior means together with their 95% posterior PI.

FIGURE 4

FIGURE 4. Comparison between posterior model-predicted outputs and calibration targets. Calibration targets with their 95% CI are shown in black. The shaded area shows the 95% posterior model-predictive interval of the outcomes and colored lines shows the posterior model-predicted mean based on 5,000 simulations using samples from the posterior distribution. Upper panel refers to adenoma-related targets and lower panel refers to CRC incidence targets by stage.

The joint distribution of the incremental quality-adjusted life years (QALYs) and incremental costs of the 10 years colonoscopy screening strategy vs. the no-screening strategy resulting from the PSA for the four uncertainty quantification approaches of the calibrated parameters are shown in Figure 5. When accounting for the uncertainty on the external parameters, there is little difference in the spread of the CEA outcomes when considering the joint distribution of the calibrated parameters vs. using only the MAP estimates (approaches 1 and 2 on the top row of Figure 5, respectively). The joint distribution of the outcomes is slightly wider when considering uncertainty on all parameters compared to when fixing the calibrated parameters at their MAP estimate. The third approach reflects the impact of only varying the calibrated parameters on the joint distribution of incremental QALYS and incremental costs, which is much narrower than approaches 1 and 2. The fourth approach, which characterizes uncertainty of the calibrated parameters using the method of moments without accounting for correlation, has the widest spread on the distribution of the outcomes.

FIGURE 5

FIGURE 5. Incremental costs and incremental QALYs of 10-years colonoscopy screening vs. no screening under different assumptions of characterization of the uncertainty of both calibrated and external parameters. The red star corresponds to the incremental costs and incremental QALYs evaluated at the maximum-a-posteriori estimate of the calibrated parameters and the mean values of the external parameters.

For the VOI analysis, we found value in eliminating uncertainty by having a positive EVPI in the parameters of the CEA of the 10-years colonoscopy screening strategy (Figure 6). However, the value varies by uncertainty quantification and WTP threshold. The first and second approaches to uncertainty quantification had similar EVPI, reaching their maximum of $653 and $685, respectively, at a $66,000/QALY WTP threshold. For WTP thresholds greater than $66,000/QALY, the first approach had a higher EVPI than the second approach. When we consider only the uncertainty for the calibrated parameters (approach 3), the EVPI is the lowest across all WTP thresholds with an EVPI of $0.1 at a WTP threshold of $66,000/QALY and reaching its highest of $212 at a WTP threshold of $71,000/QALY. The fourth approach reaches a maximum of $809 at a WTP threshold of $66,000/QALY and is the highest compared to the other approaches up to a WTP threshold of $81,000/QALY, at which the first approach has the highest EVPI.

FIGURE 6

FIGURE 6. Per-patient EVPI of 10-year colonoscopy screening vs no screening under different approaches to characterize the uncertainty of both the calibrated and external parameters.

Discussion

In this study, we characterized the uncertainty of a realistic microsimulation model of the natural history of CRC by calibrating its parameters to different targets with varying degrees of uncertainty using a Bayesian approach on an HPC environment using EMEWS. We also quantified the value of the uncertainty of the calibrated parameters on the cost-effectiveness of a 10-year colonoscopy screening strategy with a VOI analysis. EMEWS has been previously used to calibrate other microsimulation DMs (Rutter et al., 2019; Rutter et al., 2019) but has not been previously used to conduct a PSA with the calibrated parameters and calculate the VOI. Although Bayesian calibration can be a computationally intensive task, we reduce the computation time by evaluating the likelihood of different parameter sets in multiple cores simultaneously on an HPC setup, which IMIS allows.

We found that different characterizations of the uncertainty of calibrated parameters affect the expected value of reducing uncertainty on the CEA. Ignoring inherent correlation among calibrated parameters on a PSA overestimates the value of uncertainty. When the full posterior distribution of the calibrated parameters is not readily available, the MAP could be considered the best parameter set. In our example, not considering the uncertainty of calibrated parameters on the PSA did not seem to have a meaningful impact on the uncertainty of the CEA outcomes and the EVPI of the screening strategy. The uncertainty associated with the natural history was less valuable than the uncertainty of the external parameters. However, these results should be taken with caution because this analysis is conducted on a fictitious model with simulated calibrated targets. Modelers should analyze the impact of a well-conducted characterization of the uncertainty of calibrated parameters on CEA outcomes and VOI measures on a case-by-case basis.

There are examples of calibrated parameters being included in a PSA. For instance, by taking a certain number of good-fitting parameter sets (Kim et al., 2007; Kim et al., 2009), bootstrapping with equal probability good-fitting parameter sets obtained through directed search algorithms (e.g., Nelder-Mead) (Taylor et al., 2012), or conducting a Bayesian calibration, which produces the joint posterior distribution of the calibrated parameters (Menzies et al., 2017). However, this is the first manuscript to conduct a PSA and VOI analysis using distributions of calibrated microsimulation DM parameters that accurately characterize their uncertainty.

Currently, Bayesian calibration of microsimulation DMs might not be feasible on regular desktops or laptops. To circumvent current computational limitations from using Bayesian methods in calibrating microsimulation models, surrogate models -often called metamodels or emulators-have been proposed (O’Hagan et al., 1999; O’Hagan, 2006; Oakley and Youngman, 2017). Surrogate models are statistical models like Gaussian processes (Sacks et al., 1989a; Sacks et al., 1989b; Oakley and O’Hagan, 2002) or neural networks (Hauser et al., 2012; Jalal et al., 2021) that aim to replace the relationship between inputs and outputs of the original microsimulation DM (Barton et al., 1992; Kleijnen, 2015), which, once fitted, are computationally more efficient to run than the microsimulation DM. Constructing an emulator might not be a straightforward task because the microsimulation DM still needs to be evaluated at different parameter sets, which could also be computationally expensive. Furthermore, the statistical routines to build the emulator may not be readily available in the programming language in which the microsimulation DM is coded. These are situations where EMEWS can be used to construct metamodels efficiently; however, this is a topic for further research.

Researchers might actively avoid questions that would require HPC due to the perceived difficulties involved or make do with less-than-ideal smaller-scale analyses (e.g., choosing the maximum likelihood estimate or a small set of parameters instead of the posterior distribution for uncertainty quantification) and the robustness of the conclusions can suffer as a result.

In this article, we showed that EMEWS could facilitate the use of HPC to implement computationally demanding Bayesian calibration routines to correctly characterize the uncertainty of the calibrated parameters of microsimulation DMs and propagate it in the evaluation of CEA of screening strategies and quantify their value of information. This study’s methodology and results could guide a similar VOI analysis on CEAs using microsimulation DMs to determine where more research is needed and guide research prioritization.

Data Availability Statement

The raw data supporting the conclusion of this article will be made available by the authors, without undue reservation.

Author Contributions

FA-E contributed to the conceptualization, data curation, formal analysis, funding acquisition, investigation, methodology, project administration, resources, software, supervision, validation, visualization, and writing of the original draft and of reviewing and editing; AK contributed to the conceptualization, formal analysis, funding acquisition, investigation, methodology, resources, validation, and writing of the original draft and of reviewing and editing; JO contributed to the conceptualization, formal analysis, funding acquisition, investigation, methodology, resources, validation, and writing of the original draft and of reviewing and editing; NC contributed to the conceptualization, formal analysis, funding acquisition, investigation, methodology, resources, validation, and writing of the original draft and of reviewing and editing; KK contributed to the conceptualization, formal analysis, funding acquisition, investigation, methodology, resources, validation, and writing of the original draft and of reviewing and editing.

Funding

Financial support for this study was provided in part by a grant from the National Council of Science and Technology of Mexico (CONACYT) and a Doctoral Dissertation Fellowship from the Graduate School of the University of Minnesota as part of Dr. Alarid-Escudero’s doctoral program. All authors were supported by grants from the National Cancer Institute (U01- CA-199335 and U01-CA-253913) as part of the Cancer Intervention and Surveillance Modeling Network (CISNET). The work was supported in part by the U.S. Department of Energy, Office of Science, under contract (No. DE- AC0206CH11357). The funding agencies had no role in the study’s design, interpretation of results, or writing of the manuscript. The content is solely the authors’ responsibility and does not necessarily represent the official views of the National Institutes of Health. The funding agreement ensured the authors’ independence in designing the study, interpreting the data, writing, and publishing the report. This research was completed with resources provided by the Research Computing Center at the University of Chicago (Midway2 cluster).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Alarid-Escudero F., MacLehose R. F., Peralta Y., Kuntz K. M., Enns E. A. (2018). Nonidentifiability in Model Calibration and Implications for Medical Decision Making. Med. Decis. Mak [Internet] 38 (7), 810–821. doi:10.1177/0272989x18792283

PubMed Abstract | CrossRef Full Text | Google Scholar

Arias E., Heron M., Xu J. (2017). United States Life Tables. Natl. Vital Stat. Rep. 66 (4), 63.

Google Scholar

Barton R. R. (1992). “Metamodels for Simulation Input-Output Relations,” in Winter Simulation Conference. Editors J. J. Swain, D. Goldsman, R. C. Crain, and J. R. Wilson, 289–299. doi:10.1145/167293.167352