Parameter Estimation of Platelets Deposition: Approximate Bayesian Computation With High Performance Computing

Cardio/cerebrovascular diseases (CVD) have become one of the major health issue in our societies. Recent studies show the existing clinical tests to detect CVD are ineffectual as they do not consider different stages of platelet activation or the molecular dynamics involved in platelet interactions. Further they are also incapable to consider inter-individual variability. A physical description of platelets deposition was introduced recently in Chopard et al. (2017), by integrating fundamental understandings of how platelets interact in a numerical model, parameterized by five parameters. These parameters specify the deposition process and are relevant for a biomedical understanding of the phenomena. One of the main intuition is that these parameters are precisely the information needed for a pathological test identifying CVD captured and that they capture the inter-individual variability. Following this intuition, here we devise a Bayesian inferential scheme for estimation of these parameters, using experimental observations, at different time intervals, on the average size of the aggregation clusters, their number per mm2, the number of platelets, and the ones activated per μℓ still in suspension. As the likelihood function of the numerical model is intractable due to the complex stochastic nature of the model, we use a likelihood-free inference scheme approximate Bayesian computation (ABC) to calibrate the parameters in a data-driven manner. As ABC requires the generation of many pseudo-data by expensive simulation runs, we use a high performance computing (HPC) framework for ABC to make the inference possible for this model. We consider a collective dataset of seven volunteers and use this inference scheme to get an approximate posterior distribution and the Bayes estimate of these five parameters. The mean posterior prediction of platelet deposition pattern matches the experimental dataset closely with a tight posterior prediction error margin, justifying our main intuition and providing a methodology to infer these parameters given patient data. The present approach can be used to build a new generation of personalized platelet functionality tests for CVD detection, using numerical modeling of platelet deposition, Bayesian uncertainty quantification, and High performance computing.


Introduction
Blood platelets play a major role in the complex process of blood coagulation, involving adhesion, aggregation and spreading on the vascular wall to stop a hemorrhage while avoiding the vessel occlusion.Platelets also play a key role in the occurrence of cardio/cerebrovascular accidents that constitute a major health issue in our societies.In 2015, related diseases were the first cause of mortality worldwide, causing 31% of deaths (Organization, 2015) Despite the use of antiplatelet drugs in primary and secondary prevention, there is a significant portion of treated patients who develop an ischemic recurrence (±10%).This raises the question of a possible resistance to one or several antiplatelet agents.In most cases, a standard posology is prescribed to patients.This posology does not take into account the inter-individual variability linked to the absorption or the effectiveness of these molecules.In this context, the notion of responder and non-responder emerged in the literature.Moreover, studies indicate that the evaluation of the response to a treatment by the existing tests is test-dependent.In addition, the response to antithrombotic drugs is highly patient-dependent.
Nowadays, platelet function testing is performed either as an attempt to monitor the efficacy of anti-platelet drugs or to determine the cause of abnormal bleeding or pro-thrombotic status.The most common method consists of using an optical aggregometer that measures the transmittance of light passing through plasma rich in platelets (PRP) or whole blood, to evaluate how platelets tend to aggregate.Other aggregometers determine the amount of aggregated platelets by electric impedance or luminescence.In specific contexts, flow cytometry is also used to assess platelet reactivity (VASP test).
A review article has evaluated platelet functions determination using 6 different techniques in patients undergoing coronary stent implantation (Breet et al., 2010).The correlation between the clinical biological measures and the occurrence of a cardiovascular event was null for 3 of the techniques and rather modest for the 3 others, indicating the evident need for a more efficient tool or method to monitor patient platelet functionalities.In another review (Picker, 2011) the author insists on the fact that no current test allows the analysis of the different stages of platelet activation or the prediction of the in vivo behavior of those platelets.Very recently, another review confirmed the observations previously described (Koltai et al., 2017).Although there is a lot of data on the molecules involved in platelet interactions (Maxwell et al., 2007), these studies indicate that there a lack of knowledge on some fundamental mechanisms that should be revealed by new experiments.
Recently, by combining two techniques, digital holography microscopy (DHM) and mathematical modeling, we provided a physical description of the adhesion and aggregation of platelets in the Impact-R device (Chopard et al., 2015;Chopard et al., 2017).
Quantities such as adhesion and aggregation rates were determined for a set of healthy volunteers, by tuning the parameters of the mathematical model in order to reproduce the deposition patterns observed in the Impact-R.
We claim here that the values of these adhesion and aggregation rates are precisely the information needed for a new class of clinical tests, assessing various possible pathological situations and quantifying their severity.However, the determination of these adhesion and aggregation rates requires to search the parameter space of the mathematical model.This has to be repeated for each patient and thus require a powerful numerical approach.Obviously, the clinical applicability of the proposed technique to provide such a new platelets function test remains to be explored.In the present paper, we show that our approach, combined with High performance computing (HPC) and modern numerical algorithms of parameter identification such as approximate Bayesian computation (ABC) brings the technical elements to build this new class of medical tests.
In section 2 we introduce the necessary background knowledge about the platelet deposition model, whereas Section 3 recalls the concept of Bayesian inference and introduces the HPC framework of ABC used in this study.Then we illustrate the results of the parameter determination for platelet deposition model using ABC methodology, collectively for 7 patients in Section 4. We notice the same methodology can be used to determine the parameter values for each individual patients in a similar manner.Finally, in section 5 we conclude the paper and discuss its impact from a biomedical perspective.

Background and Scientific Relevance
The Impact-R (Shenkman et al., 2008) is a well-known platelet function analyzer.It is a cylindrical device filled in with whole blood from a donor.Its lower end is a fixed disk, serving as a deposition surface, on which platelets adhere and aggregate.The upper end of the Impact-R cylinder is a rotating cone, creating an adjustable shear rate in the blood.Due to this shear rate, platelets moves towards the deposition surface, where they adhere or aggregate.Platelets aggregate next to already deposited platelets, or on top of them, thus forming clusters whose size increase with time.This deposition process has been successfully described with a mathematical model, as proposed in Chopard et al. (2015); Chopard et al. (2017).These kind of numerical models are used, more and more often, to understand different aspects of nature, ranging from sub-atomic particles to entire human societies or whole universes.Natural scientists, e.g., researchers in the field of physics, chemistry, biology, using their knowledge domain expertise, traditionally hypothesize models underlying natural phenomenon.
In the present case, our model (coined model M in what follows) requires five parameters that specify the deposition process and are relevant for a bio-medical understanding of the phenomena.In short, the blood sample in the Impact-R device is supposed to contain an initial number N platelet (0) of non-activated platelets per µℓ and a number N act−platelet (0) of pre-activated platelets per µℓ.Initially both type of platelets are supposed to be uniformly distributed within the blood.Due to the process known as shear-induced diffusion, platelets hit the deposition surface.Upon such an event, an activated platelets will adhere with a probability that depends on its adhesion rate, p Ad , that we would like to determine.Platelets that have adhered on the surface are the seed of a cluster that can grow due to the aggregation of the other platelets reaching the deposition surface.We denote with p Ag the rate at which new platelets will deposit next to an existing cluster.We also introduce p T the rate at which platelets deposit on top of an existing cluster.An important observation made in Chopard et al. (2015); Chopard et al. (2017) is that albumin, which is abundant in blood, compete with platelet for deposition.As a result, the number of aggregation clusters and their size tends to saturate as time goes on, even though there are still a large number of platelets in suspension in the blood.To describe this process in the model, we introduced two extra parameters, p F , the deposition rate of albumin, and a T , a factors that accounts for the decrease of platelets adhesion and aggregation on locations where albumin has already deposited.Although the proposed model used additional parameters (see Chopard et al. (2017)) we assume here that they are known.For the purpose of the present study, the model M is parametrized in terms of the five quantities introduced above, namely the adhesion rate p Ad , the aggregation rates p Ag and p T , the deposition rate of albumin p F , and the attenuation factor a T .Collectively, we define If the initial values for N platelet (0) and N act−platelet (0), as well as the concentration of albumin are known from the experiment, we can forward simulate the deposition of platelets over time using model M for given values of these parameters θ θ θ = θ θ θ * : where S agg−clust (t), N agg−clust (t), N platelet (t) and N act−platelet (t) are correspondingly average size of the aggregation clusters, their number per mm 2 , the number of non-activated and preactivated platelets per µℓ still in suspension at time t.We refer the reader to Chopard et al. (2015); Chopard et al. (2017) for more explanation on the mathematical model.
The Impact-R experiments have been repeated with the whole blood obtained from seven donors.Observations are made at time 0 sec., 20 sec., 60 sec., 120 sec.and 300 sec.At these five time points, (S agg−clust (t), N agg−clust (t), N platelet (t), N act−platelet (t)) are measured.
Let us call the observed dataset collected through experiment as, By comparing the number and size of the deposition aggregates obtained from the invitro experiments with the computational results obtained by forward simulation from the numerical model (see Fig. 1 for an illustration), we can manually calibrate the model parameters by a trial and error procedure.Though due to the complex nature of model and dataset, this manual determination of the parameter values are subjective and time consuming.However, if the parameters of the model could be learned more rigorously with an automated data-driven methodology, we could immensely improve the performance of these models and bring this experiment as a new clinical test for platelet function.To this aim, in the present project, we propose to use ABC for Bayesian inference of the parameters.
As a result of Bayesian inference to this context, not only we can automatically and efficiently estimate the model parameters, but we can also perform parameter uncertainty quantification in a statistically sound manner, and determine if the provided solution is unique.

Bayesian Inference
We can quantify the uncertainty of the unknown parameter θ θ θ by a posterior distribution p(θ θ θ|x x x) given the observed dataset x x x = x 0 x 0 x 0 .A posterior distribution is obtained, by Bayes' Theorem as, where π(θ θ θ), p(x x x|θ θ θ) and m(x x x) = π(θ θ θ)p(x x x|θ θ θ)dθ θ θ are correspondingly the prior distribution on the parameter θ θ θ, the likelihood function, and the marginal likelihood.The prior distribution π(θ θ θ) ensures a way to leverage the learning of parameters with prior knowledge, which is commonly known due to the availability of medical knowledge regarding cardio-vascular diseases.If the likelihood function can be evaluated, at least up to a normalizing constant, then the posterior distribution can be approximated by drawing a sample of parameter values from the posterior distribution using (Markov chain) Monte Carlo sampling schemes (Robert and Casella, 2005).For the simulator-based models considered in Section 2, the likelihood function is difficult to compute as it requires solving a very high dimensional integral.In next Subsection 3.1, we illustrate ABC to perform Bayesian Inference for models where the analytical form of the likelihood function is not available in closed form or not feasible to compute.

Approximate Bayesian computation
ABC allows us to draw samples from the posterior distribution of parameters of the simulator-based models in absence of likelihood function, hence to perform sound statistical inference (eg., point estimation, hypothesis testing, model selection etc.) in a data-driven manner.In a fundamental Rejection ABC scheme, we simulate from the model M(θ θ θ) a synthetic dataset x sim x sim x sim for a parameter value θ θ θ and measure the closeness between x sim x sim x sim and x 0 x 0 x 0 using a pre-defined discrepancy function d(x sim x sim x sim , x 0 x 0 x 0 ).Based on this discrepancy measure, ABC accepts the parameter value θ θ θ when d(x sim x sim x sim , x 0 x 0 x 0 ) is less than a pre-specified threshold value ǫ.
Simulated Annealing ABC with HPC The Rejection ABC scheme, described above, is computationally inefficient.To explore the parameter space in an efficient manner, there exists a large group of ABC methods based on sequential Monte Carlo (SMC) algorithm (Marin et al., 2012).As pointed in (Dutta et al., 2017), these ABC algorithms based on SMC, consists of four fundamental steps: 1. (Re-)sample a set of parameters θ θ θ either from the prior distribution or from an already existing set of parameters; 2. For each of the parameters of the whole set or a subset, perturb it using the perturbation kernel, accept the perturbed parameter based on a decision rule governed by a threshold or repeat the whole second step; 3.For each parameter calculate its weight; 4. Normalize the weights, calculate a co-variance matrix and adaptively re-compute the threshold for the decision rule.
These four steps are repeated until the weighted set of parameters, interpreted as the approximate posterior distribution, is 'sufficiently close' to the distribution.The steps (1) and ( 4) are usually quite fast, compared to steps (2) and (3), which are the computationally expensive parts.Generally, for the decision rule in step (2), we simulate x sim x sim x sim using the perturbed parameter and accept it if d(x sim x sim x sim , x 0 x 0 x 0 ) < ǫ, an adaptively chosen threshold.The generation of x sim x sim x sim from the model, for a given parameter value, usually takes up huge amounts of computational resources (e.g. 10 minutes for the platelets deposition model in this paper).Though we notice, in both of these two steps, all tasks can be run independently from each other since they do not require any communication.In our recent work (Dutta et al., 2017), based on this simple parallelizability of the ABC algorithms, we have adapted these algorithms for HPC environment.Our implementation is available in Python package ABCpy and shows a linear scalability.
For an acceptance to occur in step (2), different parameters may take different amounts of time, making ABC algorithms with the decision rule described above imbalanced (Hadjidoukas et al., 2015;Dutta et al., 2017).For 'extremely' expensive simulator-based models like the present ones, this imbalance can reduce the HPC performance in a drastic manner.To solve this, in this paper we propose simulated annealing approximate Bayesian computation (SABC) (Albert et al., 2015), which depends on a probabilistic decision rule in step ( 2) and uses only one simulation for each perturbed parameter.This removes the imbalance reported for the other ABC algorithms.Additionally SABC is known also to converge faster to the posterior distribution w.r.t.total number of simulated dataset needed.Using SABC within HPC framework implemented in ABCpy (Dutta et al., 2017), we draw Z independent and identically distributed (i.i.d.) samples from the posterior distribution p(θ θ θ|x 0 x 0 x 0 ), while keeping all the tuning parameters for the SABC fixed at the default values suggested in ABCpy package, except the number of steps and the acceptance rate cutoff, which was chosen respectively as 100 and 0.0001.To perform SABC for the platelets deposition model, the summary statistics extracted from the dataset, discrepancy measure between the summary statistics, prior distribution of parameters and perturbation Kernel to explore the parameter space for inference are described next.
The summary statistics, described above, are chosen to capture the mean values, variances and the intra-and inter-dependence of different variables of the time-series over time.
Discrepancy measure: Assuming the above summary statistics contain the most essential information about the likelihood function of the simulator-based model, we compute Bhattacharya-coefficient (Bhattachayya, 1943) for each of the variables present in the timeseries using their mean and variance and Euclidean distances between different inter-and intra-correlations computed over time.Finally we take a mean of these discrepancies, such that, in the final discrepancy measure discrepancy between each of the summaries are equally weighted.The discrepancy measure between two datasets, x x x 1 and x x x 2 can be specified as, is the Bhattacharya-coefficient (Bhattachayya, 1943) and 0 ≤ exp(−ρ(•)) ≤ 1.Further, we notice the value of the discrepancy measure is always bounded in the closed interval [0, 1].
Perturbation Kernel: To explore the parameter space of θ θ θ = (p Ag , p Ad , p T , p F , a T ) ∈ R 5 , we consider a five-dimensional multi-variate Gaussian distribution as the perturbation kernel.

Parameter estimation
Given experimentally collected platelet deposition dataset x 0 x 0 x 0 , our main interest is to estimate a value for θ θ θ.In decision theory, Bayes estimator minimizes posterior expected loss, x 0 x 0 ) for an already chosen loss-function L. If we have Z samples (θ θ θ i ) Z i=1 which are independent and identically distributed following the posterior distribution p(θ θ θ|x 0 x 0 x 0 ), our Bayes estimator can be defined as, As we consider the loss-function L(θ θ θ, θ θ θ) = (θ θ θ − θ θ θ) 2 for this paper, the Bayes-estimator can be shown to be θ θ θ = E p(θ θ θ|x 0

Inference on experimental dataset
The performance of the inference scheme described in Section 3 is illustrated here, for a collective dataset created from the experimental study of platelets deposition of 7 blooddonors.The collective dataset was created by a simple average of: over 7 donors at each time-point t.In Figure 2, we show the Bayes estimate (blacksolid) and the marginal posterior distribution (black-dashed) of each of the five parameters computed using 5000 samples drawn from the posterior distribution p(θ θ θ|x 0 x 0 x 0 ) using SABC.For comparison, we also plot the manually estimated values of the parameters (gray-solid) in Chopard et al. (2017).We notice that the Bayes estimates are in a close proximity of the manually estimated values of the parameters and also the manually estimated values observe a significantly high posterior probability.This shows that, through the means of ABC we can get an estimate or quantify an uncertainty of the parameters in platelets deposition model which is as good as the manually estimated ones, if not better.
Additionally, to point the strength of having a posterior distribution for the parameters we compute and show the posterior correlation matrix between the 5 parameters in Figure 3, highlighting a strong negative correlation between (p F , a T ), strong positive correlations between (p F , p Ag ) and (p F , p T ).A detailed investigation of these correlation structure would be needed to understand them better, but generally they may point towards a) the stochastic nature of the considered model for platelet deposition and b) the fact that the deposition process is an antagonistic or synergetic combination of the mechanisms proposed in the model.Note finally that the posterior distribution being the joint probability distribution of the 5 parameters, we can also compute any higher-order moments, skewness etc. of the parameters for a detailed statistical investigation of the natural phenomenon.

Conclusions
In this paper we have demonstrated that approximate Bayesian computation (ABC) can be used to automatically explore the parameter space of a numerical model simulating the deposition of platelets subject to a shear flow.We also notice a good agreement between the manually tuned parameters and the Bayes estimates.The proposed approach can be applied patient per patient, in a systematic way, without the bias of a human operator.In addition, the approach is computationally fast enough to provide results in an acceptable time for contributing to a new medical diagnosis, by giving data that no other known method can provide.The clinical relevance of these data is still to be explored and our next step will be to apply our approach at a personalized level, with a cohort of patients with known pathologies.The possibility of designing new platelet functionality test as proposed here is the result of combining different techniques: advanced microscopic observation techniques, bottom-up numerical modeling and simulations, recent data-science development and high performance computing (HPC).Additionally, the ABC inference scheme provides us with a posterior distribution of the parameters given observed dataset, which is much more informative about the underlying process.The posterior correlations structure shown in Fig. 3 may not have a direct biophysical interpretation, though it illustrates some sort of underlying and unexplored stochastic mechanism for further investigation.Finally we note that, although the manual estimates achieve a very high posterior probability, they are different from the Bayes estimates learned using ABC.The departure reflects a different estimation of the quality of the match between experimental observation and simulation results.As the ABC algorithms are dependent on the choice of the summary statistics and the discrepancy measures, the parameter uncertainty quantified by SABC in Section 4 or the Bayes estimates computed are dependent on the assumptions in Section 3.1 regarding their choice.Fortunately there are recent works on automatic choice of summary statistics and discrepancy measures in ABC setup Dutta et al. (2016);Gutmann et al. (2017), and incorporating some of these approaches in our inference scheme is a promising direction for future research in this area.

Figure 1 :
Figure 1: The deposition surface of the Impact-R device after 300 seconds (left) and the corresponding results of the deposition in the mathematical model (right).Black dots represent the deposited platelets that are grouped in clusters.

Figure 3 :
Figure3: Posterior correlation matrix of (p Ad , p Ag , p T , p F , a T ) computed from the 5000 i.i.d.samples drawn from the posterior distribution using SABC.
(Chopard et al., 2017), a T ) for collective dataset generated from of 7 patients.The smoothed marginal distribution is created by a Gaussian-kernel density estimator on 5000 i.i.d.samples drawn from the posterior distribution using SABC.The (gray-solid) line indicates the manually estimated values of the parameters in(Chopard et al., 2017).
Ritabrata Dutta and Antonietta Mira was supported by Swiss National Science Foundation Grant No. 105218 163196 (Statistical Inference on Large-Scale Mechanistic Network Models).We thank CADMOS for providing computing resources at the Swiss Super Computing Center and acknowledge partial funding from the European Union Horizon 2020 research and innovation programme for the CompBioMed project (http://www.compbiomed.eu/)under grant agreement 675451.