Artificial Intelligence Applied to in vitro Gene Expression Testing (IVIGET) to Predict Trivalent Inactivated Influenza Vaccine Immunogenicity in HIV Infected Children

The number of patients affected by chronic diseases with special vaccination needs is burgeoning. In this scenario, predictive markers of immunogenicity, as well as signatures of immune responses are typically missing even though it would especially improve the identification of personalized immunization practices in these populations. We aimed to develop a predictive score of immunogenicity to Influenza Trivalent Inactivated Vaccination (TIV) by applying deep machine learning algorithms using transcriptional data from sort-purified lymphocyte subsets after in vitro stimulation. Peripheral blood mononuclear cells (PBMCs) collected before TIV from 23 vertically HIV infected children under ART and virally controlled were stimulated in vitro with p09/H1N1 peptides (stim) or left unstimulated (med). A multiplexed-qPCR for 96 genes was made on fixed numbers of 3 B cell subsets, 3 T cell subsets and total PBMCs. The ability to respond to TIV was assessed through hemagglutination Inhibition Assay (HIV) and ELIspot and patients were classified as Responders (R) and Non Responders (NR). A predictive modeling framework was applied to the data set in order to define genes and conditions with the higher predicted probability able to inform the final score. Twelve NR and 11 R were analyzed for gene expression differences in all subsets and 3 conditions [med, stim or Δ (stim-med)]. Differentially expressed genes between R and NR were selected and tested with the Adaptive Boosting Model to build a prediction score. The score obtained from subsets revealed the best prediction score from 46 genes from 5 different subsets and conditions. Calculating a combined score based on these 5 categories, we achieved a model accuracy of 95.6% and only one misclassified patient. These data show how a predictive bioinformatic model applied to transcriptional analysis deriving from in-vitro stimulated lymphocytes subsets may predict poor or protective vaccination immune response in vulnerable populations, such as HIV-infected individuals. Future studies on larger cohorts are needed to validate such strategy in the context of vaccination trials.

The number of patients affected by chronic diseases with special vaccination needs is burgeoning. In this scenario, predictive markers of immunogenicity, as well as signatures of immune responses are typically missing even though it would especially improve the identification of personalized immunization practices in these populations. We aimed to develop a predictive score of immunogenicity to Influenza Trivalent Inactivated Vaccination (TIV) by applying deep machine learning algorithms using transcriptional data from sort-purified lymphocyte subsets after in vitro stimulation. Peripheral blood mononuclear cells (PBMCs) collected before TIV from 23 vertically HIV infected children under ART and virally controlled were stimulated in vitro with p09/H1N1 peptides (stim) or left unstimulated (med). A multiplexed-qPCR for 96 genes was made on fixed numbers of 3 B cell subsets, 3 T cell subsets and total PBMCs. The ability to respond to TIV was assessed through hemagglutination Inhibition Assay (HIV) and ELIspot and patients were classified as Responders (R) and Non Responders (NR). A predictive modeling framework was applied to the data set in order to define genes and conditions with the higher predicted probability able to inform the final score. Twelve NR and 11 R were analyzed for gene expression differences in all subsets and 3 conditions [med, stim or (stim-med)]. Differentially expressed genes between R and NR were selected and tested with the Adaptive Boosting Model to build a prediction score. The score obtained from subsets revealed the best prediction score from 46 genes from 5 different subsets and conditions. Calculating a combined score based on these 5 categories, we achieved a model accuracy of 95.6% and only one misclassified patient. These data show how a predictive bioinformatic model applied to transcriptional analysis deriving from

INTRODUCTION
The advent of vaccinations has reshaped the history of medicine and across the twenty-first century has led to a decrease in morbidity of previously fatal diseases (1,2). However, with steadily improving survival rates due to the availability of novel therapeutic tools, the vulnerable populations with special vaccination needs is burgeoning (3)(4)(5). Nowadays vaccine development programs mainly focus on otherwise healthy populations; as such, vaccine indications are based on data arising from healthy study participants. Accordingly, most vaccine indications in vulnerable groups (VPs), elderly, pregnant women and patients affected by chronic conditions (i.e., HIV infected patients), are derived from extrapolations, assumptions, or post-licensure studies (5). Thus, limited data are currently available to tailor vaccine interventions in these populations. Since the seasonal flu vaccine is well-established in routine use in HIV, it may represent the paradigm vaccine to illustrate many of the issues that affect most or all vaccines in VPs. Despite recommendations on seasonal influenza vaccination for targeted or at-risk groups (i.e., HIV, elderly, comorbidities etc.), such populations are at increased risk of acquiring vaccine-preventable infections and suffer higher infectious morbidity and mortality than healthy individuals (6,7). This represents a major health and economic burden to society, which will become increasingly difficult to manage given limited public resources (8). In parallel, many uncertainties remain about the optimal strategies for identifying susceptible individuals, and for offering them sustained protection through a personalized immunization schedule. Novel biomarkers of protective immune responses to vaccines are needed. Vaccinology, based on the immune response network theory (9), which utilizes immunogenetics, immunogenomics and systems biology approaches to understand the basis for inter-individual variations in vaccine induced immune responses can provide such biomarkers (10,11). In particular, vaccinomics utilize high-throughput, high-dimensional systems biology approaches, which aims to predict differences in protective or suboptimal immune responses to vaccines (12). In this regard, the basis of personalized and predictive vaccinology is the assessment of an individual's genetic background that may impact vaccine immunogenicity and efficacy. Thus, far this approach has been mostly conducted in healthy subjects leading to important findings (13). However, these data can only be partially translated in to specific populations. We recently described distinct transcriptional signatures of purified B and T cell subsets in vertically HIV infected children that was able to distinguish between patients able to respond to Trivalent Inactivated Vaccination (TIV) compared vs. non-responders (14,15). In addition, purified H1N1 specific B cells showed significant differences in P-TEN/PI3KC2B pathway between responders and non-responders (16,17).
Following the idea that a single vaccine cannot "fit all" (9), we here aimed at developing a predictive score of poor or protective vaccination immune response to seasonal flu vaccination through an artificial intelligence approach fed by data deriving from a novel in vitro gene expression testing approach (IVIGET) in HIV infected patients differentially responding to TIV. We here showed that a multiplexed gene expression analysis from sorted lymphocites subsets in different in vitro conditions was able to feed an artificial inteligence model able to select predictive features of influenza vaccination immunogenicity in a pediatric population with suboptimal immune response upon the influenza vaccination.

Study Subjects
Twenty-three subjects vertically infected with HIV-1 (abbreviated as HIV) and on suppressive anti-retroviral therapy (ART) were enrolled between September and November 2012 at Bambino Gesù Children's Hospital, Rome, Italy. Written informed consent was obtained from all subjects or parents/legal guardians upon enrolment and the study was approved by the Institutional review board of the Bambino Gesù Children's Hospital. PBMCs and plasma were isolated by density gradient isolation [46] collected pre (T0) and 21 days post vaccination (T1) and cryopreserved and processed for study at a later date. Serum samples were stored at −80 • C.

Hemagglutination Inhibition (HI) Assay
The antibody titers to the H1N1, H3N2 and B influenza strains in sera from HIV and HC were evaluated separately by HI assay (18). The virus strains used in the HI assay were A/California/7/2009 (H1N1) pdm09like strain, A/Victoria/361/2011 (H3N2)-like strain and B/Wisconsin/1/2010-like strain according to the 2012-2013 influenza vaccine formulation. The HI assay was performed as previously described (18). The HI antibody titers were expressed as the reciprocal of the highest serum dilution at which hemagglutination was prevented. (http://www.gmp-compliance. org/guidemgr/files/021496EN.PDF).

ELISPot
PBMCs collected at T0 and T1 from HIV and HC were thawed and policlonally activated in vitro in complete RPMI medium (Invitrogen) supplemented with 2.5 µg/mL CpG type B (Hycult biotech), 20 ng/mL IL-4 (Peprotech) and 20 ng/mL IL-21 (ProSpec). Cells were harvested after 5 days of culture at 37 • C. ELISpot 96-well filtration plates (Millipore) were coated with the addition of purified H1N1, H3N2, and B influenza inactivated virus particles and subsequently loaded with 2 × 10 5 cells/well. Membranes were punched out with an Eli.Punch device and developed spots were scanned with an Eli.Scan and counted with the ELISpot Analysis Software V5.1 (all from A.EL.VIS).

Determining Vaccine Response Status
T0 and T1 samples were employed to evaluate patient's ability to respond to the vaccination as previously described (17). Response to vaccinations was determined both by ELISPot for the 3 strains of Flu vaccines (H1N1, H3N2, B) and by Haemagglutination-Inhibition assay (HIA) detected at the time of immunization and 21 days after vaccination as previously described (14,15). In order to compare patients with differential ability to respond to the vaccination, patients were first selected according to their seroconversion to H1N1 21 days after the immunization resulting in 2 groups, Seroconverter (HIA fold increase ≥ = 4) and Non Seroconverter (HIA fold increase < 4). As additional criteria, patients were selected according to the ELISpot responses for H1N1 at 21 days after immunization as ELISpot negative (<80 H1N1 specific spots/106 PBMCs) and ELISpot positive (>80 H1N1 specific spots/106 PBMCs). According to these 2 criteria we could select among the HIV infected children 12 non responders (NR; HIA fold increase <4 AND H1N1 specific spots <80/106 PBMCs), 11 responders (R; HIA fold increase >= 4 AND H1N1 specific spots >80/106 PBMCs).
In vitro Stimulation, Cell Sorting, and RNA Extraction Five-Hundred live cells per B and T cell subset were sorted into tubes previously loaded with 9 µL of PCR buffer (see also Figure 1 for gating strategy). After sorting, cells were immediately centrifuged (3000RPM for 3 min) and kept on ice. Samples were subsequently transferred in PCR tubes and 18 PCR cycles were performed on a C1000 Thermal Cycler (Bio Rad) with the following scheme (50 • C for 20 ′ , 95 • C for 2 ′ , 95 • C 15 ′′ , 60 • C for 4 ′ . Last step repeated 18 times). Cells were finally kept at −20 until further analysis. PCR buffer premix for cell sorting contained the following: Cells Direct Reaction mix 5 µL, DEPC water 1,4, Superscript III + Taq 1 µl, 0.2x diluted assay (96 primer mix) 2.5 µL, Superasein 0,1 µL.

Multiplexed RT-PCR
Previously amplified samples were loaded on a Fluidigm 96.96 standard chip following manufacturer's instructions. Briefly, assay pre-mix was prepared 1:1 20X TaqMan Gene Expression Assay (Applied Biosystems) and Assay Loading Reagent (Fluidigm, Biomark R ). The sample pre-mix was prepared with TaqMan Universal PCR Master Mix (2X)(Applied Biosystems), 20XGE Sample Loading Reagent (Fluidigm), and cDNA. Full list of the two panels of gene probes (B subsets and T subsets is shown in Supplemental Table 1 and 2). 5 µl of Assay and Sample mix were loaded into the chip according to manufacturers instructions. Genes' selection has been made according to previous analysis on RNA Sequencing on HIV infected children from a different cohort (data not shown), the literature and online gene banks and biological queries.
Cycle threshold value (Ct) deriving from exported files was corrected according to number of cells sorted if lower than 500. Calculations were made following the expression 67, 5/500 = Y/X where X is the number of cells sorted and Y is the cells equivalent cDNA of cell sorted. The dilution factor (n) was calculated as n = 67, 5/Y, and base 2 log of n was subsequently subtracted to Ct value in order to get Corrected Cycle Threshold (c-Ct). Expression threshold (Et), which was used for the main analysis was finally obtained with 40-cCT. Once exported and corrected, data were analyzed through Fluidigm SingluaR (SingulaR analysis toolset 3.0) package, loaded on R (software R 3.0.2 GUI 1.62). As previously described (De Armas, 2017) gene expression differences between different groups within same subset and condition were used to identify Differentially Expressed Genes (DEGs). Alternatively, paired gene expression differences between stimulated (stim) and unstimulated samples (med) (stim-med) within the same subset were used to define Differentially Induced Genes (DIGs). All raw data on gene expression analysis used for the present project are available on the Gene Expression Omnibus: NCBI gene expression and hybridization array data repository (GEO) (GSE155730).

Predictive Modeling Framework
We propose a workflow (Figure 2) used for gene selection and model building that use the 96 genes with age and sex as covariates. This method was applied to each subset of B cells (AM, FIGURE 1 | Experimental design. The cartoon on the top panel depicts the experimental procedure. Briefly, total PBMCs are in 2 aliquotes, one stimulated and the latter unstimulated. After sorting lymphocite subsets, gene expression is analyzed by Fluidigm Biomark. Bottom panel describes the gating strategies and the lymphocites subsets selected for sorting and gene expression analysis. Mathematical analysis applied on the subsets in order to obtain differentially expressed genes (DEGs) and differentially induced genes (DIGs) are described.
DN, REM) and T cells (CD4, NT, PBMC, TFH), further divided into two conditions, namely stimulated (stim), unstimulated (med) and the derived data of the stim-med, for 23 patients. In addition, this workflow has also been applied to the entire dataset (B and T cells) to obtain a predicted probability score. Due to the small sample size and high dimensional data, the Wilcoxon Rank Sum Test was used to select genes whose expression levels are different between responders and non-responders. As compared to other feature selection methods, this test outperforms others in terms of accuracy and robustness (19) Using the two-sided test to evaluate whether these two subpopulations had different gene levels, p-values were derived to assess significance at α = 0.05. Genes with significantly different expression levels were used in the next step of the analysis framework. The feature selection process was applied to each dataset to select those genes that are predictive of response to vaccine. Applying multiple machine learning methods, each using a different approach, increased the confidence in selecting the best genes for the model.
The machine learning methods used were Elastic Net (glmnet function in R) (20), Support Vector Machines (svm.fs function in R) (21) and Random Forests (randomForest function in R) using 3-fold cross validation repeated 8 times (22). Variable importance was also calculated by Random Forests and used to define the gene importance ranking as previously described (22). If a gene or feature was selected at least 10 times in total throughout the process, then it was considered for further analysis in the prediction model.
After selecting a subset of features by using the Wilcoxon Rank Sum Test and machine learning, an Adaptive Boosting model using continuous predictors and generating predicted probabilities of response was used as the final predictive model. The Adaptive Boosting model (http://rob.schapire.net/papers/ explaining-adaboost.pdf) was used as it is less susceptible to overfitting and attempts to combine rules to create a more accurate prediction. This model was implemented in R using the caret (http://topepo.github.io/caret/index.html) package. The ADA Boost method uses a training set, a subset of the data that is set aside, and assigns a ±1 as classifier values. A classifier value indicates how important a feature is for the model. The classifiers are then weighted based on the training set, and the prediction is recalculated. Using the data, the program will assign weights to features beyond the ±1 classifier at every stage. To obtain the final results of the model, ADA Boost uses the sum of every weight and classifier combination to provide a probability of response. The result of the model is a predicted probability of response for each patient (Figures 3, 4). Considering both the Wilcoxon and FIGURE 2 | Analysis framework flowchart: pipeline workflow for predicting the vaccine response. Differentially induced genes between responders and non-responders were selected using a machine learning feature selection based on three different algorithms and the Wilcoxon test for each cell subset and condition. The list of selected genes was used by the Adaptive Boosting algorithm to build the predictive model and calculate the prediction score. The R package "enrichR" v2.1 was used to perform functional annotation and pathway enrichment analysis on the genes selected to build the five models with the best prediction precision.

Patients' Characteristics and H1N1 Response to Influenza Vaccination
To define the ability to elicit memory response upon H1N1 of the trivalent inactivated Influenza vaccination 2012/2013 (TIV) we investigated hemoagglutination inhibition assay in 65 HIV infected children under stable and highly active antiretroviral Frontiers in Immunology | www.frontiersin.org  Table 1. Study participants were classified as vaccine responders (R) and vaccine nonresponders (NR) according to the criteria established by Food and Drug Administration Guidance for Industry as previously described (14). R were characterized by HAI titer to H1N1 antigen (Ag) at T1 of >1:40 and > four-fold increase compared to baseline. In order to validate our criteria of selection, even considering the lower reliability of serological correlates in such patients, we applied an additional measure of vaccine responsiveness in our study population performing the B cell ELISpot response to H1N1 Ag (≥ or < 80 spots /106 PBMCs in responders (R) and non-responders (NR), respectively).

Features Selection and Identification of Predictive Score Through Artificial Intelligence
In order to predict the vaccination response based on the expression levels of 96 genes, we have implemented a bioinformatic pipeline that was tested on six sorted subsets of B and T cells, as reported in Figure 1 and PBMC in 23 patients. For each subset, the conditions of stimulation (stim), nonstimulation (med) and the difference between the two (stim-med) have been considered. For each of these subset/condition, the algorithm, reported in Figure 2, selects the Differentially Induced Genes (DIGs) and Differentially Expressed Genes (DEGs) whose expression levels can, better than others, discriminate responding individuals from non-responders upon TIV. A total of 179 genes/conditions were initially selected among the different subsets and conditions ( Table 2). As shown in Table 2, a specific number of genes were respectively, selected for the med (55 genes), stim (62 genes), and med-stim conditions (62 genes). Subsequently, these genes were then used to build statistical models. The ADA Boost models generated for each category returned a probability score that estimates the classification of each patient in responder (R) and non-responder (NR). Assuming a predicted probability >0.50 classified as a responder, the ADA Boost model was able to predict R and NR in specific subsets and conditions according to the previously selected genes. Indeed, the Resting Memory (REM) med-stim, Double Negative (DN) stim, TFH med-stim and PBMC med datasets showed the best results in terms of predicted probability as shown in Figure 3. No mispredictions were found in REM med-stim, DN stim and PBMC unstim, whereas only one misprediction was found according the ADA boost of TFH med-stim. In order to provide a comprehensive description of gene expression with higher accuracy in terms of prediction probability, all gene expression analyses, from multiple subsets and conditions were ranked. Table 3 summarizes the classification accuracy and a relative ranking for each category. Rankings were used to identify the cell subsets and conditions that yielded high prediction accuracy as well as a wider range of predicted probability values. Correct prediction ranged from 68% up to 100% when tested on the cohort. Following these criteria, five cell subsets/conditions providing the highest classification accuracy combined with the highest predicted probability were highlighted (Table 3).
Finally, the B and T cells subsets/conditions were used to calculate a combined score. The score, was then tested in our cohort of patients which were blindly predicted as responders and non-responders. In this case, as shown in Figure 4 and Table 4, only one patient out of the entire cohort was misclassified providing a prediction accuracy of 95.6%.
Due to the small sample size, in order to overcome the unfeasibility to perform a nested cross valitation, we confirmed the stability of the accuracy in features' selection of the top 5 B/T subsets/conditions resampling the dataset according to these 5 subset/condition. This re-analysis confirmed the stability in feature's selection of the initial model. Indeed, all subset models out of the 5 selected cell subsets/ condition feature displayed between 80 and 100% accuracy according to the bootstrap replications (Supplementary Table 3). The best performing subset by this metric was the B DN med_stim subset with a confidence interval ranging from 94 to 100%. All genes from the subset models had 50-90% selection rates in bootstrap replicates.

Functional Analysis of Genes Selected by Artificial Intelligence
In order to characterize the biological functions of the genes used to build the five models with the highest prediction accuracy, we performed a functional enrichment analysis on the five sets of genes shown in Table 5. According to gene set enrichment analysis, the REM med_stim selected genes associated with transcription pathways of chemokine expression and T celloriented proliferation (Figure 5). These results are in line with ontologies which were particularly enriched in the positive regulation of T helper I type immune responses (Figure 5). In the T cell counterpart, pTFH cells, several genes involved in cytokine-cytokine mediated signaling were enriched. Also JAK-STAT signaling and TLR oriented stimulation pathways were upregulated (Figure 5) in patients able to respond to TIV. It is important to mention that other genes, such as IL21 and TNSF13 (APRIL), previously reported to be crucial in the T-B cell interaction (23,24), resulted informative after in vitro stimulation to define responders. According to our previous analysis (15) these data may suggest that the functional expression of these genes after in vitro stimulation is able to predict the ability of these cells to activate a functional cascade which sustain an effective humoral response after vaccination.
Gene ontology analysis also revealed the expression of chemokine receptors with complementary activity between IgD- All subsets and conditions were ranked both for the accuracy of classification and for the expected probability range. According to both rankings, the categories with the best classification capacity, highlighted in red, were selected for the final score.

CD27-Double Negative (DN) B cells and pTFH. Importantly IL2
and IL2RA were both selected in the pTFH and DN, respectively. These data may suggest how activation of this pathway after in vitro stimulation represents a functional correlate of plasma cell lineage commitment after in vitro stimulation as previously reported in mice (25).

DISCUSSION
Definitive and predictive biomarkers of vaccination efficacy are still largely unknown and may provide crucial information in the design or improvement of existing vaccines. This gap further applies to specific groups of patients presenting with underlying immunological conditions which increase their risk of suboptimal responses to vaccinations (4,5). In the present study we developed a predictive score of immunogenicity to seasonal flu vaccination through an artificial intelligence approach fed by data deriving from a novel in vitro gene expression testing approach (IVIGET) performed prior to the immunization in a cohort of HIV infected patients. Systems biology has helped to develop specific predictive assays in the oncology field. Also, targeted molecular assays have played an increasingly important role in identifying prognostic outcomes or predicting response to chemotherapy, starting from tumor biopsies (26). Indeed, these assays, which are now routinely performed in local pathology labs to help guide treatment decisions in breast cancer (27) lung cancer (28), and colorectal cancer (29), have been tested and validated on tumor biopsies.
On the other hand, systems vaccinology has been often analyzed in total blood or cell suspensions (e.g., PMBCs) which present an high intrinsic variability due to transitory confounding effects (e.g., concomitant infections or vaccination, inflammation, systemic immune deficiency, etc.) which may represent important variables making the aim of systems biology even more challenging. In addition, specific changes in cell frequency due to underlying immune defects or to physiologic conditions (i.e., age, pregnancy) may importantly interfere with the analysis of functional correlates of vaccine efficacy (11,30).
Additional confounding effects are represented by interindividual differences such as gender, age, pre-existing immunity, microbiota or systemic conditions which may further affect data analysis and their interpretation (31)(32)(33).
Following this idea, over the last few years we have described trancriptional signatures of vaccine response from purified lymphocyte subsets or single Ag specific cells in VPs (14,16,17). Our data demonstrated how the analysis of purified cell subpopulations may provide additional information compared to total PBMCs, and how gene expression analysis after in vitro stimulation may provide distinct predictive correlates of Ab and cellular response upon TIV (16) in VPs. In the present study,   DUSP4, HAVCR2, IL2RA, PDL1, PPP3CA, SAMHD1,  SELPLG, STAT3, TLR9   TFH med_stim  ABCB1, DUSP4,FOXP3, ICOS, IFNG, IL2, IL21, LAG3,  MAPK3, PDCD1, PDL1, SOCS1, TNFSF13   PBMC med  ADAM17, BCL6, CAV1,CCR6, GATA3, IL6RA, IL6ST, PKC.A,  BST2, CD3D, CXCR4, ICOS,ID2, IFNG, IL21R,IRF4,MAF,  PTX3 DN med_stim CAMK4, MX1, SELPLG, SOCS1, TLR9 In red are shown the up-regulated genes for the med and stim categories (DN stim, PBMC med) and the genes with the highest (stim-med) value in the Responders for the medstim categories (REM med-stim, TFH med-stim, DN med-stim. In blue are shown the down-regulated genes for the med and stim categories and the genes with the lowest (stim-med) in the Responders for the med-stim categories (REM med-stim, TFH med-stim, DN med-stim. our analysis approach complement the evidence produced on single subsets applying state-of-the-art machine learning and methodology to the in vitro gene expression testing (IVIGET) which is focused on cell subsets directly involved in the immune responses upon Influenza vaccination. After multiple gene selection methods were applied for all subsets and conditions the score was interrogated at the time of vaccination on the ability to predict immune response to TIV in a previously investigated cohort of HIV infected patients (14). Albeit limited by the small sample size, which made the nested cross validation unfeasible, our analysis was able to perform a selection of genes and conditions able to predict vaccination response in specific B and T cell subsets. Conditions with higher prediction probability and correct classification were further ranked and selected to produce the final score which was blindly tested on the cohort. To further overcome the contamination of the test set, the 5 top ranked subsets after single-model re-analysis confirmed the stability of the accuracy and suggests how the model is able to build a predictive score for vaccination response by selecting important subset/condition to be validated in larger scale studies. Four out of five of the subset/condition selected for the final score included the stimulated condition and more precisely three out of the five refer to data derived from the difference in gene expression between the stimulated and unstimulated condition. Overall these results suggest that this in vitro stimulation approach in combination with others in vitro tests recently described (34) may provide important information in term of prediction of vaccine responsiveness and early pre-clinical selection of effective vaccine candidates for VPs. Our data may thus confirm that gene expression after a relatively short (16 h) in vitro stimulation may emulate early transcriptional changes that were analyzed in vivo both in mice (35) and in humans (36,37). Early transcriptional changes, derived from whole transcriptome sequencing from blood samples collected at day 0, 1, 3, and 7 after immunization were shown to be informative in predicting long-term humoral and cell mediated responses to Hepatitis B, Ebola (38) and yellow fever (36) vaccinations. Interestingly the majority of differentially expressed genes (DEGs) resulted from the analyis between day 1 or day 3 and day 0 suggesting that early signatures were able to orchestrate and correlate with long term memory responses. In line with this, our analysis revealed how the majority of selected features were among Differentially Induced Genes (DIGs) after a peptide mix stimulation of 16 h). Following these evidence we also recently reported how early signatures after in vitro stimulation in Ag specific B cells were able to define the B cell fate after re-encountering of the antigen (39). Overall these findings suggest that both the analysis of purified cells, directly involved in the immune response triggered by a peptide-specific stimulation may provide distinct signatures of immunogenicity that may be useful to implement vaccination predictive tools.
It is important to consider as a limit of the tools presented here, that effectiveness of the score may be specific to the seasonal influenza vaccination (e.g., 2009) and may not apply to other viral strains that make up the vaccine as it continuously changes over the years. It was noted by Nakaya et al. that transcriptional differences differed between the Live Attenuated Influenza Vaccine and the TIV with respect to both classes of genes and cell subsets orchestrating the early immune response (40). Indeed, in a targeted microarray confirmatory analysis on sorted subsets, B cells showed higher DEGs in TIV vaccinee compared to LAIV with a peculiar enrichment in Antibody secreting cells genes (39,40). Additional studies will be needed to cross validate the score in yearly vaccinated patients and in a vaccine-type/year specific manner. The overfitting caused by the model may represent another limit of the study (41). To reduce this potential problem, we adopted the Adaptive Boosting algorithm which is less susceptible to overfitting through implicit regularization and attempts to combine rules to create a more accurate prediction (http://rob.schapire.net/papers/explainingadaboost.pdf). Moreover, we have implemented a robust feature selection based on four different methods and we used a dataset balanced between responders and non-responders. To further reduce the risk of overfitting and to increase the accuracy of the models, it would be necessary to increase the sample size and possibly use two independent datasets for the testing and training phases (42).
As previously discussed, the score applies on a relatively limited and curated panel of genes that cannot provide a complete mechanistic insight on the biology orchestrating the immune response upon TIV. However, the genes selected by the score confirm the accumulating evidence on B and T lymphocytes functional data which have been produced in the last few years in patients differentially responding to TIV. Indeed, previous results in pTFH after in vitro stimulation highlight the importance of IL21, found upregulated in R (43), confirming previous report in children, adults and elderly able to respond to TIV (44). Also the IL2 pathway, in line with previous evidence (42,43), seems negatively correlated to the ability to respond to the vaccination when over expressed by pTFH. Overall, these data suggest how IL2 expression triggers a Th1 oriented immune response, rather than long term memory, which was confirmed in another study investigating the correlation between circulating TfH and immunogenicity upon Ebola virus vaccination (44). Our data further add information about the IL2 pathway in the B cell compartment as it was noted that IL2RA was included by the gene selection of DN after in vitro stimulation. The IL2 effect on human naïve B cells was recently investigated for the ability to induce plasmacell differentiation through ERK signaling after BACH2 silencing (25). For the first time, we showed that in TIV non responders, IL2RA receptor was upregulated in the so called "double negative" B-cell subset (expressing neither CD27 nor IgD), recently reported to be accumulated in aging populations (45). Overall, these data may suggest how the lack of downregulation in the B cell counterpart after IL2 production from the circulating TFH may interfere with an adequate memory response. Additional studies on this subset will be needed in order to define whether a manipulation or an adjuvanted vaccination specifically targeting the IL21/IL2 molecule production and receptors expression may increase vaccine immunogenicity.
Also Resting memory after in vitro stimulation were selected by the score as an informative subset of TIV response. BATF, a transcription factor which was recently showed to induce plasmacell differentiation of memory B cells after CD40L/CD40 signaling (46) was upregulated after H1N1 peptides in vitro stimulation of TIV responders. Also CD69 and CCR2 genes emerging from the score in the Resting memory subset confirm the importance of a T cell mediated response.
Although this analysis cannot provide a full mechanistic insight of the molecular mechanisms underlying the immune response since it is performed on a curated and limited panel of genes rather than the full transcriptome, it is promising in providing functional correlates to be used in a prediction score. Additional mechanistic analysis with deeper transcriptional analysis should confirm findings from these data.
In conclusion our analysis suggest that the in vitro stimulation and gene expression analysis on purified cell subsets that are involved in the immune responses upon vaccination, may represent valuable information to build a predictive score of immunogenicity. These analyses should be supported by future studies with larger sample size in order to validate this score in HIV infected children. These results may inform novel and more effective immunization strategies in HIV infected children and in other vulnerable population presenting with suboptimal immune responses.
Future studies, beyond the current approach, to evaluate protective immune responses remains an important goal to facilitate the interpretation of response to existing and emerging vaccines, particularly in VPs.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Materials, further inquiries can be directed to the corresponding author/s.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Coordinamento Amminstrativo Studi Clinici e Comitato Etico Presidenza IRCCS Ospedale Pediatrico Bambino Gesù. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.

AUTHOR CONTRIBUTIONS
NC, LD, PP, and SaP conceived the study and designed the experiments. NC and LD performed the experimental procedures. NC and GP drafted the first version of the article. CB, MF, and NC performed statistical analysis and bioinformatics. VS, EM and PZ provided samples and participated to the design of the study. Supervision and resources were provided by PR, PP, and SP. All authors participated in writing, review and editing of the article.

FUNDING
This work was made possible by support from a Miami CFAR pilot award to NC, from Children's Hospital Bambino Gesù, ricerca corrente 2020 to NC and ricerca corrente 2019 to PP. We also received support from grants AI108472 and AI127347 to SP and the Laboratory Sciences Core of the Miami CFAR (P30AI073961) from the National Institutes of Health (NIH), which was supported by the following NIH Co-Funding and Participating Institutes and Centers: NIAID, NCI, NICHD, NHLBI, NIDA, NIMH, NIA, NIDDK, NIGMS, FIC, and OAR.