In-silico evaluation of adenoviral COVID-19 vaccination protocols: Assessment of immunological memory up to 6 months after the third dose

Background The immune response to adenoviral COVID-19 vaccines is affected by the interval between doses. The optimal interval is unknown. Aim We aim to explore in-silico the effect of the interval between vaccine administrations on immunogenicity and to analyze the contribution of pre-existing levels of antibodies, plasma cells, and memory B and T lymphocytes. Methods We used a stochastic agent-based immune simulation platform to simulate two-dose and three-dose vaccination protocols with an adenoviral vaccine. We identified the model’s parameters fitting anti-Spike antibody levels from individuals immunized with the COVID-19 vaccine AstraZeneca (ChAdOx1-S, Vaxzevria). We used several statistical methods, such as principal component analysis and binary classification, to analyze the correlation between pre-existing levels of antibodies, plasma cells, and memory B and T cells to the magnitude of the antibody response following a booster dose. Results and conclusions We find that the magnitude of the antibody response to a booster depends on the number of pre-existing memory B cells, which, in turn, is highly correlated to the number of T helper cells and plasma cells, and the antibody titers. Pre-existing memory T cytotoxic cells and antibodies directly influence antigen availability hence limiting the magnitude of the immune response. The optimal immunogenicity of the third dose is achieved over a large time window, spanning from 6 to 16 months after the second dose. Interestingly, after any vaccine dose, individuals can be classified into two groups, sustainers and decayers, that differ in the kinetics of decline of their antibody titers due to differences in long-lived plasma cells. This suggests that the decayers may benefit from a tailored boosting schedule with a shorter interval to avoid the temporary loss of serological immunity.

The stochastic execution of the algorithms coding for the dynamical rules results in a sequence of cause/effect events culminating in the production of effector immune cells and in the establishment of immunological memory. The starting point of this series of events is the injection of the adenovirus coding for the Spike protein of SARS-CoV-2 (see Figure S1). This may take place any time after the simulation starts. In the present study, the first immunization injection is performed at time zero. At that initial time, the system is "naïve" with respect to the injected antigen, meaning that there are neither specific T and B memory cells nor plasma cells and antibodies able to recognize the antigen peptides. Moreover, the system is designed to maintain the global population of cells in a quasi steady-state (homeostasis), if no stimulation takes place, whereas the system moves away from such dynamical (stochastic) equilibrium when it is perturbed by an antigenic challenge. Besides the parameters defining the characteristics of the adenovirus related to entry in the muscle cells and production of the spike protein of the Severe Acute Respiratory Syndrome Coronavirus 2 isolate Wuhan-Hu-1 (NCBI Reference Sequence: NC 045512), whose primary structure is reported in Figure S1 and further described in the following section "Computing peptides immunogenicity", the coded vaccine construct in this model is defined as a set of B-cell epitopes and T-cell epitopes consisting of amino acid sequences and defining its antigenicity.
If the vaccine elicits a strong immune response it depends on the injected dose and on the antigenicity of the B and T epitopes. These variables determine the level deployment of both cellular and humoral branches of the immune system, as shown in past simulation studies [17]. The model resorts to pre-computed ranked lists of T-cell epitopes calculated with the neural network NetMHCpan method [18,22,21]. This feature, which is described below, follows the choice of a definite HLA set (discussed below in the section "Selecting the HLA haplotype"). As the neutralizing antibody response to the Spike protein is mainly focused on the RBD, and one of the datasets used to tune the model contains anti-RBD antibody measures, we included in the simulations only B-cell epitopes from the RBD domain. B-cell epitopes were computed with BepiPred [15]. The computed epitopes largely overlap with RBD regions experimentally identified as targets of anti-Spike antibody responses [12]. Figure S1. Primary structure of the spike protein (NCBI Reference Sequence: YP 009724390.1) of the Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1 (NCBI Reference Sequence: NC 045512). The RBD is shown in yellow and the B-epitopes computed by BepiPred in red.

Estimated Parameters
The model has been extensively used in the past so that many parameters have already been fixed (either by manual curation with literature information or by numerical estimation in generic settings -i.e., not pathogen-specific simulations). For the present study we calibrated only the parameters in table S1.

Selecting the HLA haplotype
The computational model accounts for differences in the HLA haplotype when determining which peptides are presented by antigen-presenting cells. To this end, it takes in input a list of such peptides for each HLA molecule considered together with the propensity of each peptide to bind to it. This list is computed by using third-party immunoinformatics tools as described in the next section "Computing the peptide immunogenicity". The "HLA haplotype freq search" in the "Allele Frequency Net Database" [2] was used in order to select two HLA-A, two HLA-B and two DRB alleles which are most prevalent in the caucasian phenotype [4]. The result pointed to the following alleles: HLA-A*02:01, HLA-A*24:02, HLA-B*35:01, HLA-B*40:02, DRB1*07:01 and DRB1*15:01.

Computing peptides immunogenicity
The strain of SARS-CoV-2 used in this study corresponds to the reference sequence NCBI Reference Sequence: NC 045512.2. The primary structure of the spike protein is the only molecule used to derive B epitopes and class I and II peptides. The spike protein (NCBI Reference Sequence: YP 009724390.1) is reported in Figure S1, with the RBD region evidenced in yellow and the B-epitopes computed with BepiPred (using the parameter EpitopeThreshold equal to 0.5, corresponding to specificity of 0.57 and sensitivity 0.58 [10]) shown in bold red. To identify cytotoxic T-cell peptides (CTL peptides) and helper T-cell peptides (HTL peptides) of the spike protein, we have employed two immunoinformatics tools: -for the prediction of 9-mer long CTL peptides, the "ANN 4.0 prediction method" in the online tool MHC-I binding prediction of the IEDB Analysis Resource [20] was used to find peptides with affinity for the chosen set of HLA class I alleles (i.e., HLA-A*02:01, HLA-A*24:02, HLA-B*35:01 and HLA-B*40:02) [23,19,3]. The peptides were classified as strong, moderate and weak binders based on the peptide percentile rank and IC50 value. Peptides with IC50 values ¡50 nM were considered to have high affinity, ¡500 nM intermediate affinity and ¡5000 nM low affinity towards a particular HLA allele. Also, lower the percentile rank, greater is meant the affinity [23,19,3]; -for what concerns the HTL peptides, the NetMHCIIpan 3.2 server was used for the prediction of 9-mer long HTL peptides which had an affinity for the HLA class II alleles (i.e., DRB1*07:01 and DRB1*15:01) used in this study [14]. The predicted peptides were classified as strong, intermediate and non-binders based on the concept of percentile rank as given by NetMHCII with a threshold value set at 2, 10 and > 10% , respectively. In other words, peptides with percentile rank ≤ 2 were considered as strong binders whereas a percentile rank between 2 and 10% designate moderate binders; peptides with percentile score > 10 are considered to be non-binders [14].
Since the list of CTL and HTL peptides and the relative affinity score is the same as in [7] we send the interested reader to the supplementary material of that work for further details.

Modeling adenoviral capsid immunogenicity
The anti-adenoviral antibody response was not simulated at the epitope level, but, in a simplification of a largely unknown scenario, the vaccine-induced antibody response to the adenoviral capsid was assumed to be proportional to the vaccine-induced antibody response to the Spike. To this end, the computational model included the rule that adenoviral vaccine-induced antibodies can bind to the adenoviral vaccine. When vaccine-induced antibodies bind to the vaccine and form immune complexes, there is a reduced transduction of muscle cells by the vaccine, and less Spike antigen is produced. To evaluate the effect of anti-adenoviral antibodies, in-silico immunization experiments were performed enabling or disabling the rule that adenoviral vaccine-induced antibodies can bind to the adenoviral vaccine. We found that the effect of the interval between doses is analogous in both experiments, as in both cases higher antibody responses are obtained at longer inter-dose intervals, as shown in Figure S2. With this in mind, in the present study, we choose to enable the rule that the vaccine induces anti-adenoviral antibodies, as this was demonstrated in a clinical trial [5] and is the more realistic scenario.

Approximate Bayesian Computation to estimate parameters
To calibrate the simulator with data from [11], we used the statistical method called Approximate Bayesian Computation (ABC, [9] [25] [24]) to estimate the following parameters of the computational model: The ABC algorithm can then be formulated as follows: Step 1: execute N times the simulation by sampling with the Latin Hypercube the space of meaningful parameters θ = (θ 1 , θ 2 , θ 3 ); Step 2: for each i = 1, . . . , N and for each v ∈ {v 1A , v 1B , v 1C } compute J realizations of the stochastic model y j (t) = M j (θ i , x 0 , v, t) with j = 1, . . . , J; scale y j (t) to be in [0, 1]; Step 3: calculate d v = t (⟨y(t)⟩ − y t ) 2 1/2 where ⟨y(t)⟩ is the average value, over the J simulations, of the trajectory point y j (t) = M j (θ i , x 0 , v, t) hence d v is the residual sum of squares (RSS) computed summing all deviations from the available data points y t ; Step 4: calculate d 1A + d 1B + d 1C for each of the parameter considered θ and rank them from low to high, then pick the top 10% that minimise this sum. The distribution of these θs represents the posterior distribution of the parameters of interest.

Stepwise Regression
Stepwise regression is a statistical method to identify the most important variables among many of them when there is no theory that can help as a guide. The method is used to determine whether variables Ab(t1), Plb(t1), Th(t1), Tc(t1), and B(t1) can be used as explanatory variables for the increment of Ab induced by the second dose, that is ∆ Ab = Ab (t m ) − Ab (t 1 ).
There are three possible ways to implement it: the first consists of starting with a linear regression model containing only the constant term and, at each step, a regressor (or variable) is added to the model if it improves the fit evaluated using some criteria of fitness, such as the residual sum of squares; the second algorithm consists of stating with a linear model which includes all the variables and at each step a variable is removed if its loss does not have a significant impact on the model fit; the third algorithm is a combination of the previous two. We have resorted to the third option for our analysis because it consists in the most complete one.
More details on these model selection algorithms can be found in [13].

Analyzing data by Principal Component Regression (PCR)
For the data analysis we have employed the Principal Component Regression (PCR), which is a regression model based on Principal Component Analysis (PCA) useful when dealing with multivariate data that exhibit multicollinearity. In particular, it consists of a linear regression where the dependent variable is explained/described as a linear combination of principal components. Principal components are orthogonal vectors that span the vector space generated by the original variables. More formally, let X = (Ab(t 1 ), P lb(t 1 ), T h(t 1 ), T c(t 1 ), B(t 1 )) be our regressors, Y = Ab(t m ) be our dependent variable and W = AX be the principal components, namely the i-th column of matrix A contains the loadings of the i-th component. Then, the principal components regression provides the following model where A i refers to the i-th column of the matrix A (loadings of the i-th principal component) and α i refers to the i-th regression coefficient. More details on PCR can be found in [13]. Figure S3. As the interval between the doses becomes longer, the humoral response (i.e., Ab, Pbl, B) and the T helper response (Th) to the second dose improve. The box plots show the median, IQR, and range of Ab, Plb, Th, Tc and B in treatment groups 1A, 1B and 1C at time t f . Figure S4. The optimal immunogenicity of the third dose is achieved over a large time window, spanning from 6 to 16 months after the second dose. The box plots show the median, IQR, and range of ab, plb, th,tc and b in treatment groups 2A-I at time t m .