Machine Learning-Based Single Cell and Integrative Analysis Reveals That Baseline mDC Predisposition Correlates With Hepatitis B Vaccine Antibody Response

Vaccination to prevent infectious disease is one of the most successful public health interventions ever developed. And yet, variability in individual vaccine effectiveness suggests that a better mechanistic understanding of vaccine-induced immune responses could improve vaccine design and efficacy. We have previously shown that protective antibody levels could be elicited in a subset of recipients with only a single dose of the hepatitis B virus (HBV) vaccine and that a wide range of antibody levels were elicited after three doses. The immune mechanisms responsible for this vaccine response variability is unclear. Using single cell RNA sequencing of sorted innate immune cell subsets, we identified two distinct myeloid dendritic cell subsets (NDRG1-expressing mDC2 and CDKN1C-expressing mDC4), the ratio of which at baseline (pre-vaccination) correlated with the immune response to a single dose of HBV vaccine. Our results suggest that the participants in our vaccine study were in one of two different dendritic cell dispositional states at baseline – an NDRG2-mDC2 state in which the vaccine elicited an antibody response after a single immunization or a CDKN1C-mDC4 state in which the vaccine required two or three doses for induction of antibody responses. To explore this correlation further, genes expressed in these mDC subsets were used for feature selection prior to the construction of predictive models using supervised canonical correlation machine learning. The resulting models showed an improved correlation with serum antibody titers in response to full vaccination. Taken together, these results suggest that the propensity of circulating dendritic cells toward either activation or suppression, their “dispositional endotype” at pre-vaccination baseline, could dictate response to vaccination.

Vaccination to prevent infectious disease is one of the most successful public health interventions ever developed. And yet, variability in individual vaccine effectiveness suggests that a better mechanistic understanding of vaccine-induced immune responses could improve vaccine design and efficacy. We have previously shown that protective antibody levels could be elicited in a subset of recipients with only a single dose of the hepatitis B virus (HBV) vaccine and that a wide range of antibody levels were elicited after three doses. The immune mechanisms responsible for this vaccine response variability is unclear. Using single cell RNA sequencing of sorted innate immune cell subsets, we identified two distinct myeloid dendritic cell subsets (NDRG1-expressing mDC2 and CDKN1C-expressing mDC4), the ratio of which at baseline (pre-vaccination) correlated with the immune response to a single dose of HBV vaccine. Our results suggest that the participants in our vaccine study were in one of two different dendritic cell dispositional states at baselinean NDRG2-mDC2 state in which the vaccine elicited an antibody response after a single immunization or a CDKN1C-mDC4 state in which the vaccine required two or three doses for induction of antibody responses. To explore this correlation further, genes expressed in these mDC subsets were used for feature selection prior to the construction of predictive models using supervised canonical correlation

INTRODUCTION
Vaccination as a general strategy to prevent infectious disease has been one of the most effective public health measures since its conceptualization and implementation by Edward Jenner in the 18 th century, and has resulted in the complete eradication of smallpox, the near elimination of polio, and a dramatic reduction in the incidences of measles, mumps and other common diseases. In contrast to these successes, several notable failures in the development of effective vaccines against other common infectious diseases, including AIDS, tuberculosis, and malaria suggest that the current empirical approach to vaccine design is not effective in eliciting protective immunity in many cases (1). This variability in vaccine effectiveness highlights the need to better understand the fundamental principles of human immune responses, the "rules of immunity", and how this understanding could be used to develop vaccination strategies that are consistently effective and result in durable immunity.
Recently, several groups have applied high-throughput multiomics assays to produce a comprehensive systems-level evaluation of vaccine responses, so-called "systems vaccinology" (reviewed in (2)(3)(4)(5). One of the key questions addressed is -can baseline (prevaccine) signatures of the immune system predict vaccine responses and differentiate between responders vs. nonresponders and, if so, what can these signatures tell us about the mechanisms for eliciting protective immunity (6). The general concept that specific baseline immune signatures can predict vaccine responses has been explored in large cohort studies in the context of hepatitis B virus (HBV), influenza and malaria vaccines (7)(8)(9)(10)(11)(12). However, the connection between these molecular signatures and the underlying immunological mechanisms remains tenuous. Further, the application of largescale multi-omics assessment of large vaccination cohorts is costprohibitive, raising the question of whether advanced computational and machine learning methods may allow for the discovery of predictive mechanistic signatures in studies with smaller sample sizes (13).
The hepatitis B virus (HBV) vaccine is an ideal platform to explore these questions. First, serum anti-HBV antibody levels, which can be easily measured in participant samples, are a wellestablished correlate of protection (14). Second, the response to the HBV vaccine is highly variable, providing a broad range of responses, which is useful for identifying correlates and predictors (15). Third, around 10% of subjects respond with protective antibody titers following a single dose (16). We recently applied a series of validated multi-omics assays to measure the full range of cellular and molecular components of the immune system, including immune cell composition, DNA methylation, gene expression, protein abundance, and fecal 16S microbiome, to provide an exhaustive picture of the immune response to the HBV vaccine (17). Multi-omics integrative analysis on these data sets identified a number of candidate baseline predictors of vaccine response using serum antibody titers to the HBV surface antigen following three vaccine doses as the quantitative endpoint in a relatively small cohort of 15 vaccine recipients (13). While these candidate predictive signatures could be identified using this systemslevel approach in a relatively small cohort, a unifying mechanistic driver did not emerge. Furthermore, multi-omic analysis of whole blood failed to reveal features predictive of the variable protective responses following only a single vaccination dose.
In this report, we sought to determine if a more granular approach, namely single cell RNA sequencing in the context of an integrated multi-omic analysis, could identify the relevant cellular phenotypes and functions associated with vaccine responses. Using machine learning to identify the most discriminative gene expression features for dimensionality reduction to optimize performance of canonical correlation analysis (CCA), a truly integrative machine learning approach emerged that helps to overcome small sample sizes through a hypothesis-generationand-hypothesis-testing-in-orthologous-datasets workflow.

METHODS
Descriptions of all methods not detailed below have been published recently in (13,17).

Cohort and Sampling Description
A prospective, observational study (ClinicalTrials.gov: NCT03083158) of immune responses to the HBV vaccine (ENGERIX ® -B) was undertaken, with recruitment occurring at the Vaccine Evaluation Center (VEC), British Columbia Children's Hospital Research Institute in Vancouver, Canada. Briefly, a total of 15 eligible individuals aged 44 -73 were enrolled in the study. One ml (20 µg) of ENGERIX ® -B vaccine was administered via intramuscular deltoid injection at three different times during the study (Day 0, Day 28 and Day180). HBV titres were measured at screening, Day 28 after the initial vaccine dose (immediately prior to second dose), Day 180 (immediately prior to the third dose), and Day 208 (one month after the final dose). For the purposes of this study, participants were categorized as "dose 1 responders" if their anti-HBV serum antibody titer was >10 mIU/ml at Day 28 after a single dose of HBV vaccine, (a response considered as protective), "dose 1 marginal responder" if they had detectable anti-HBV serum antibody titer above baseline (3.1 mIU/ml) but <10 mIU/ml at Day 28, or "dose 1 non-responders" if they had no detectable anti-HBV serum antibodies at Day 28. Note that due to funding constraints, only a subset of participants and samples were used for some of the mechanistic assays.
Various omics studies were performed as described (13,17). Briefly, peripheral whole blood cells were profiled by flow cytometry, genome-wide DNA methylation (Illumina Infinium MethylationEPIC BeadChip), transcript abundance (bulk RNA-Seq), and proteome-wide protein abundance (mass spectrometry) at various time points. Additionally, the bacterial composition (microbiome) of the gut was assessed by 16S rRNA microbiome profiling pre-(Day -14 and Day 0) and post-vaccination (Day 14). Procedures for the collection and processing of PBMC samples for single cell RNA sequencing are described below.
Processing of the frozen 96-well plates containing single cell lysates was performed as previously described (19) with modifications to accommodate an Agilent BioCel automated liquid handling platform (20). Briefly, single cell lysates contained in the 96-well sorted plates were processed in batches of eight plates, with each plate containing wells reserved for 10 pg Universal Human RNA (Clontech Cat#636538) as a positive control, an ERCC-only control, and water as a negative control. Smart-seq2 cDNA synthesis, reverse transcription, and PCR were carried out in a reduced volume (12.5 µL) and with ERCC internal controls spiked-in at a reduced concentration (55 million-fold dilution of the ERCC stock in the first strand cDNA synthesis step). Amplified cDNAs from the eight 96-well plates were consolidated to two 384-well plates and purified with Ampure magnetic particles. A 10-fold diluted portion of each cDNA was assessed for expression of the human beta-actin (ACTB) housekeeping gene by qPCR for quality control of the amplified cDNAs. A total of 14,592 sample wells were processed through cDNA synthesis and ACTB qPCR on the automated platform.
A cycle threshold (Ct) of ≤35 for ACTB amplification was used as a cutoff for the selection of 3,072 cDNAs (768 per cell type) for library preparation and sequencing. A Star liquid handling platform (Hamilton) was used to consolidate cDNAs selected for Illumina Nextera XT library preps (Illumina cat# FC-131-1096) into 384well plates. An automated 1/8th Nextera XT reaction was carried out on 125 pg of the selected cDNAs for the Tn5 tagmentation step, with limited 15 cycle PCR followed by AmPure XP (Beckman Coulter Cat# A63881) bead purification. Nextera XT PCR was carried out with a combination of 384 barcode pairs using Nextera XT Index Kit V2 barcode sets A and D (Illumina cat# FC-131-2001 and -2004). Concentrations of the purified Nextera XT reactions were normalized to 1 ng/µL and combined into a 2ng pool of 384 dual-barcoded samples. RNA-seq was carried out with a total of eight 384 barcoded pools loaded across 16 lanes of an Illumina HiSeq 2500 according to manufacturer's specifications for a total of 3,072 samples sequenced, including controls. A HiSeq SBS V4 250 cycle kit (Illumina cat# FC-401-4003) and a Paired End V4 Cluster Kit (Illumina cat# PE-401-4001) was used for an estimated 2 million reads per sample.

Sequence Data Processing
Single cell RNA-seq data was processed according to published methods (19,21). Briefly, raw fastq sequencing files were demultiplexing using Illumina barcodes. Sequencing primers and low-quality bases were removed using the Trimmomatic software package (22). Trimmed reads were then aligned using HISAT (23) in two steps: first to a reference of ERCC sequences, and then to GRCh38 (Ensembl). StringTie (23) was used to assemble the resulting alignments into transcript structures using GENCODE v25 annotation (Ensembl 87; 10-2016) and gene expression values (TPM) estimated. HTSeq-count (24) was used to generate raw gene alignment counts.
Quality control analysis was performed using sequencing and laboratory metrics, including average Phred score, percent duplicate reads, and transcript isoform count, to classify cell samples as pass or fail using a Random Forest quality control classification model previously described (21). Expression values for the top 2500 genes ranked based on variance from cell samples that passed quality control classification were fed into Scanpy (25) for principal component analysis (PCA) and Uniform Manifold Approximation and Projection (UMAP)based non-linear dimensionality reduction and visualization (26,27). Unsupervised clustering was performed for the entire dataset, while additional supervised clustering guided by flow cytometry marker panels was performed to investigate within cell type variation. Lastly, cell type marker determination was performed using the Louvain unsupervised clustering results and the NS-Forest algorithm (28,29). The end result of this computational pipeline produced a set of unbiased cell type clusters, a gene expression matrix with the expression levels of genes in individual single cells grouped into cell type clusters, and a set of sensitive and specific marker genes for each cell type cluster (Supplementary Table 2) for use in downstream quantitative PCR assays and semantic representations (30).

qRT-PCR
Aliquots (2 µL) of the Smart-seq2 cDNAs from single sorted myeloid dendritic cells (mDCs) were diluted 10-fold in low TE (10mM Tris, 0.1mM EDTA) and 2.5 µL of the diluted cDNAs were subjected to 10 µL Taqman ™ qPCR assays for the human beta actin (ACTB) housekeeping gene (ThermoFisher Hs01060665_g1 FAM-MGB) using 5 µL of a 2X PerfeCTa qPCR SuperMix ROX (Quantabio cat# 95050-500) as an initial screen for endogenous gene expression. Thermocycling conditions were completed on a Quantstudio 6 qPCR instrument (Applied Biosystems) using the following thermocycling profile: initial 95°C activation for 2 minutes followed by 45 cycles of 95°C for 10 seconds and 60°C for 30 seconds. Positive reactions -cycle threshold (Ct) of less than 35 for ACTB amplification -were identified and their corresponding cDNAs screened using two additional marker genes selected from the NS-Forest analysis, CDKN1C (ThermoFisher Hs00175938_m1 FAM-MGB) and NDRG2 (ThermoFisher Hs01045114_g1 FAM-MGB), using the same thermocyling conditions as for ACTB.

In Vitro Whole Blood Stimulation
Pre-vaccination (baseline) blood samples were stimulated in vitro (Milieu Interieur) with LPS (Invivogen -tlrl-3pelps), poly I:C (Invivogen -vac-pic), or SEB (Kindly given by Bernard Nocht Institute) with appropriate negative controls and incubated in TruCulture tubes within 15 minutes of blood collection, inserted into a dry block incubator, and maintained at 37°C (± 1°C) for 22 hours as described (13,17). Cell fractions were collected and lysed in Trizol for RNA extraction. cDNA was prepared using the SmartSeq 2 protocol as described above. Quantitative PCR (qPCR) was performed in triplicates for each sample targeting CDKN1CC and NDRG2 using ACTB as a housekeeping gene. The data were analyzed using the standard delta-delta Ct method (2 -ddCt ) in order to generate fold difference in gene expression values.

mDC Functional Assessment
To assess the ability of mDCs to induce T cell activation, 50 ml of whole blood was drawn from healthy adult donors and PBMC isolated as previously described (17). Cells were stained using cocktails of surface marker antibodies (FCER1A, CD11c, CD1C, CD14, CD3, CD123, CD16A and HLADR) to specifically sort mDC2, mDC4 and T cells (Supplementary Figures 3, 4 and Supplementary Table 3). Cell sorting was performed using the BD Aria (II 85µm nozzle) in cold chambers. Sorted cells were spun down at 600g for 10 min, resuspended in 1ml PBS (Gibco # 14190), counted, and seeded in 96 well plates pre-filled with either LPS (100 ng/ml, Invivogen tlrl-3pelps) or polyI:C (20µg/ml, Invivogen Tlrtpact) or medium as a negative control. The autologous T cells were labelled with Cell Trace ™ Oregon Green diluted in PBS according to the manufacturer's instructions (Invitrogen # C34555), and then rested in AIM V medium (Gibco Cat# 12055-091) with 2% heat inactivated human AB serum (e.g., Gemini BioProducts) at 37°C for 24 hr. The next day, DC cultures were briefly treated with 20 mM of EDTA to detach adherent cells, all cells harvested, washed three times in complete medium to remove the TLR ligands, and counted. Dendritic cells (mDC2 or mDC4) were then mixed with labelled T cells at a ratio of 1:5 in 125µl complete AIM V medium and incubated at 37°C for 5 days. As a positive control, Oregon green labelled T cells were stimulated with 1µg/ml of anti CD3/28 antibody (Invitrogen #16-0037/#16-0288). On day 5, the cells were detached using EDTA, washed and stained with a cocktail of antibodies (Supplementary Table 4) to assess the proliferation of specific T cells using BD LSRII flow cytometer. All flow cytometry data were analyzed using Flowjo version 10 (Flowjo, Ashland, OR).
Diablo Supervised sGCCA DIABLO, part of the mixOmics framework, is a data-driven, hypothesis-free multi-omics integration approach that has been successfully applied, by us and others, to derive novel, robust biomarkers, and increase our understanding of the molecular regulatory mechanisms that underlie health and disease (31)(32)(33)(34). DIABLO extends sparse Generalized Canonical Correlation Analysis (sGCCA) into a supervised multi-omics data integration framework (35,36). DIABLO performs multivariate dimensionality reduction and selects correlated variables from several datasets by maximizing the covariance between linear combinations of variables (latent component), across both multiomics datasets (blocks) and an outcome (response) variable, in this case anti-HBV serum antibody titers. The data are then projected into a smaller dimensional subspace spanned by the latent components for classification. Here we used DIABLO to identify correlates of vaccine response (anti-HBV IgG level measured at the final follow-up visit, Day 208), from the multi-omics profiles in baseline pre-vaccination samples in an integrative fashion.

Variable Antibody Response to HBV Vaccination
A detailed description of the study design and sample collection strategy has been reported previously (17). Briefly, fifteen participants were given the standard three-dose HBV vaccination regimen and blood samples were collected before vaccination (Visit 3) and 28 days after each of the three vaccine doses (Visits 8, 10, and 12). As observed in previous studies (13), a subset of three participants showed measurable Ab titers following a single HBV vaccine dose, with two of the three participants achieving protective Ab levels of >10 mIU/ml ( Figure 1). Fourteen of the fifteen participants achieved protective antibody titers after the second and third doses, with a >2 log range in antibody titers after the third dose.

mDC Subsets With Distinct Gene Expression
Single cell RNA sequencing (scRNAseq) of innate immune cell subsets sorted from whole blood was used to define their transcriptional phenotypes with relationship to HBV vaccination responses. In order to ensure the capture of any distinct phenotype that might correlate with vaccine response, blood samples were collected on Day 0 pre-vaccination and Day 1, 3, 7, and 14 postvaccination, from a subset of participants, including the two dose 1 responders (GR01, GR04), the one dose 1 marginal responder (GR15), and three dose 1 non-responders (GR13, GR17, GR19) according to the HBV-specific antibody titers measured at 28 days after the first vaccine dose. Single monocytes (MON), myeloid dendritic cells (mDC), plasmacytoid dendritic cells (pDC), and natural killer cells (NK) were sorted into microtiter plate wells and single cell cDNAs showing positive ACTB expression by qPCR analyzed by scRNAseq and UMAP embedding of the gene expression data. Each of the four major innate immune cell subsets were well segregated in the UMAP plot (Figure 2A). In addition, lower abundance outlier clusters we also detected for the mDC and pDC sorted populations, indicating some level of subtype heterogeneity. Unsupervised clustering produced seven distinct transcriptome clusters ( Figure 2B), including the lower abundance mDC and pDC outliers. No obvious cluster-specific enrichment of cells from individual participants, processing batch, age group or sample collection date was observed (Figures 2C-F).
Each unsupervised cell cluster showed distinct differential gene expression patterns identified using both logistic regression (Figures 3A-C) and NS-Forest-based marker gene selection ( Figures 3D, E and Supplementary Table 2). The main mDC subset (Louvain cluster #2) appeared to exclusively express the p57 kip2 cyclin-dependent kinase inhibitor gene CDKN1C and expressed relatively high levels of LINC01272 in comparison with other innate cell subsets (Figures 3A-E). These cells also expressed high levels of the Fc gamma receptor gene FCGR3A (Figures 3A-C). In contrast, the outlier mDC cluster (Louvain cluster #4) exclusively expressed the n-myc regulated gene NDRG2 and expressed relatively high levels of the Fc epsilon receptor gene FCER1A, the MHC class II gene HLA-DQA1 ( Figures 3A-E), and other MHC class II genes (not shown). The high-level expression of MHC class II genes suggests that the outlier mDC subset is activated, whereas the expression of the p57 kip2 CDK inhibitor suggests that the main mDC subset is resting. Expression of FCGR3A in the main mDC cluster and FCER1A in the outlier FIGURE 1 | Serum antibody response to HBV vaccination -Serum antibody titers were measured in samples collected from fifteen study participants (GR01 -GR19) before (Visit 3) and after (Visit 8, 10, and 12) vaccination. Vaccine doses were administered on Visit 3, Visit 8 and Visit 10 (after blood collection) for a total of three doses of HBV vaccine in all fifteen participants. Limit of detection of 3.1 mUI/ml and correlate of protection of 10 mUI/ml are indicated. These data were also previously used to create Figure 1C in (13). mDC cluster suggests that these two subsets correspond to the DC4 and DC2 dendritic cell types defined previously (37) and will be referred to as CDKN1C-expressing mDC4 and NDRG2expressing mDC2. In order to determine if the NDRG2-expressing mDC2 and CDKN1C-expressing mDC4 phenotypes had been observed in previous studies, the NDRG2 and CDKN1 marker genes were used to search for expression modules in the MSigDB database. Four MSigDB modules included both of these marker genes: which were all derived from dendritic cells stimulated with TLR3 agonists.

Relative Abundance of mDC Subsets Correlate With Vaccine Response
Exclusive expression of NDRG2 in mDC2 and CDKN1C in mDC4 suggested that these two markers could be used to distinguish these mDC subsets. Indeed, qPCR amplification showed mutually exclusive expression of these two genes in sorted mDCs ( Figure 4A). Thus, qPCR for NDRG2 and CDKN1C was used to identify and quantify these two mDC subsets in ten of the participants across the entire time course of the study (Supplementary Tables 5 and 6). The relative proportions of NDRG2-expressing mDC2/CDKN1Cexpressing mDC4 were found to be dynamic and vary between individuals ( Figure 4B). Interestingly, a relatively high ratio of NDRG2-expressing mDC2/CDKN1C-expressing mDC4 was found in the two dose 1 responders (GR01 and GR04) at baseline Day 0 (Supplementary Table 6). Indeed, the average NDRG2-expressing mDC2/CDKN1C-expressing mDC4 ratio in dose 1 responders on Day 1 was 3.13 and in non-responders was 0.46. Interestingly, while the ratio of NDRG2-expressing mDC2/ CDKN1C-expressing mDC4 dropped dramatically following vaccination of the two dose 1 responders, the ratio was relatively static or increased in non-responders ( Figure 4B and Supplementary Table 6). While these findings in-and-of themselves are not adequately powered to draw definitive conclusions regarding these correlations due to the small sample size of this pilot, they did produce a hypothesis regarding the dispositional state of dendritic cells that could be explored in orthologous data.
Given that NDRG2-expressing mDC2 and CDKN1Cexpressing mDC4 marker genes in the MSigDB database had  been identified as malleable to polyI:C stimulation, we further explored how TLR3 activation might lead to differential responses in these DC subsets, we examined the changes in expression of CDKN1C and NDRG2 in whole blood collected from participants prior to HBV vaccination and stimulated with a TLR3 agonist. Interestingly, the two participants that showed the highest Ab responses to the first vaccine dose -GR01 and GR04showed preferential up-regulation of NDRG2 in their blood cells, whereas cells from non-responding participants showed preferential up-regulation of CDKN1C suggesting they were predisposed to a more resting/inhibitory mDC4-like phenotype (Supplementary Figure 5).

mDC Subsets Differ in Their Functional Predisposition
To explore if there are any functional differences between these mDC subsets, NDRG2-expressing mDC2 and CDKN1Cexpressing mDC4 subsets were sorted and incubated with sorted T cells from the same (autologous) donor, with or without pre-stimulation with TLR3 (pIC) or TLR4 (LPS) agonists prior to co-culture to assess the impact of TLR prestimulation on the ability to induce T cell proliferation as a proxy read out for immune activation as described (37). When unstimulated CDKN1C-expressing mDC4 cells were incubated with unstimulated autologous T cells, proliferation of 13.0% of CD4 and 15.8% of CD8 was induced ( Figures 5A, B). However, if CDKN1C-expressing mDC4 cells were first stimulated with pI:C or LPS, this baseline T cell proliferation was inhibited 3.9-fold (mean n = 4) for both CD4 and CD8 T cells (paired t-test p = 0.030 and 0.036 respectively). In contrast, NDRG2expressing mDC2 cells did not induce T cell proliferation with or without LPS stimulation ( Figure 5C). These results suggest CDKN1C-expressing mDC4 cells exhibit tonic T cell activating ability and that this activating ability is suppressed following TLR3 or TLR4 stimulation, whereas NDRG2-expressing mDC2 do not activate T cell proliferation either with or without TLR stimulation.

Machine Learning Model Using mDC TLR3 Feature Selection Improves Serum Antibody Response Predictions
In our previous study, we found that multi-omics data could be used to produce predictive models of antibody titers based on the supervised sparse generalized canonical correlation analysis implemented in the Diablo algorithm (13), even given the relatively large feature space provided by the transcriptomic and CpG methylation data ( Figure 6A). Based on the scRNAseq and functional studies described above, we hypothesized that if the relative abundance of the NDRG2expressing mDC2 and CDKN1C-expressing mDC4 subsets were indeed mechanistically linked to vaccine responses, selecting features specific to these cell subsets might produce improved correlation with serum antibody levels.
Using just the bulk gene expression and DNA methylation data from baseline pre-vaccination samples associated with the MSigDB marker genes derived from dendritic cells stimulated with TLR3 agonists to build the Diablo predictive model for all fifteen participants, a significant improvement in model performance was obtained ( Figure 6B). For example, the Spearman's rank correlation of the gene expression model improved from 0.62 to 0.87 and the median error improved from 15.66% to 7.23%. Similar improvements were observed with the model produced using the selected CpG DNA methylation features. And while cross-validation did not show a significant improvement, likely due to the small sample size, we went from~50,000 transcripts and~800,000 CpG sites to only a few hundred transcripts and a few thousand CpG sites as a result of this feature selection step demonstrating a substantial enrichment of informative features. Single myeloid dendritic cells were sorted from blood collected prior to HBV vaccination (D0) and 1 day (D1), 3 days (D3) and 7 days (D7) post vaccination. Following cDNA preparation, the expression of NDRG2 (mDC2 expressing gene) and CDKN1C1 (mDC4 expressing genes) mRNAs were quantified by qPCR. The graph shows the change in the relative proportion of NDRG2-expressing mDC2s/CDKN1C-expressing mDC4s at each time point compared to D0 per study participants. Solid lines show the HBV dose 1 responders (with anti-HBV titres higher than 10 mIU/ml at Day 28 after first dose and titers indicated next to the lines). Dotted lines show the HBV dose 1 non-responders (less than 10 mIU/ml HBV titers) after the first dose of vaccine. Each line shows values of individual participants; the Y axis values were log transformed. Raw data is provided in Supplementary  Tables 5 and 6. These results indicate that feature selection based on TLR3induced dendritic cell genes produced machine learning models that provided better correlation with serum antibody responses, suggesting that the relative contributions of the NDRG2expressing mDC2 and CDKN1C-expressing mDC4 subsets influences this vaccine response.

DISCUSSION
In a relatively small cohort of HBV vaccine recipients, we identified two distinct mDC subsets using single cell transcriptomics analysis, the ratio of which at baseline (i.e., before vaccination) correlated with vaccine response to a single dose of the HBV vaccine. These two mDC subsets were distinguishable by the differential expression of a number of genes that allowed for their putative matching to dendritic cell subsets identified previously (37), designated here as NDRG2expressing mDC2 and CDKN1C-expressing mDC4.
Three pieces of evidence suggest that these dendritic cell subsets contribute to the immune state at baseline prior to vaccination that influences vaccine responses. First, the two individuals who generated serum antibody responses to a single dose of HBV vaccine had a high ratio of NDRG2-expressing mDC2/CDKN1C-expressing mDC4 in their peripheral blood prior to vaccination. Second, whole blood from these responding individuals showed preferential upregulation of NDRG2 when stimulated with a TLR3 agonist. Third, machine learning developed to predict serum antibody titers after the third vaccine dose in all fifteen participants and built using baseline pre-vaccination sample data and genes differentially expressed in dendritic cells stimulated with TLR3 agonists outperformed models built without this feature selection step. Thus, the dispositional state of these dendritic cell subsets at baseline appears to provide improved correlation with HBV vaccine responses.
The mechanism of how NDRG2-expressing mDC2 and CDKN1C-expressing mDC4 impact the vaccine response to the HBV vaccine is not yet clear. But, interestingly, ex vivo CDKN1C-expressing mDC4 cells were able to induce autologous CD4 and CD8 T cell proliferation yet mDC2 did not; and TLR3 or TLR4 stimulation of the mDC4 subset inhibited this T cell stimulation. Such a high state of functional plasticity in the mDC4 subset, and the longitudinal variation over time in the ratio of NDRG2-expressing mDC2 and CDKN1C-expressing mDC4 in whole peripheral blood may at least partially explain the variability in immune responses to the HBV vaccine.
Several other groups have recently explored the identification of baseline predictors of vaccine responses using multi-omics assays with results that differ from those reported here. Tsang CpG sites were assigned to dendritic cell TLR3-response genes as described previously (38,39). Note that although the microbiome and wbc-lipidomics data are identical in the two sets, the models they produce (features retained and their coefficients) in the generalized canonical correlation framework are different due to different correlation characteristics with the different dnameth and mrna models. In both cases, the number of components and number of features/component were selected to maximize model performance.
et al. used multi-omics assays to explore responses to seasonal influenza vaccine in healthy adults (8). While neither Day 0 gene expression nor pathway analysis alone was predictive of vaccine responses in their study, twelve cell populations assessed by flow cytometry, including memory, naive, and transitional B cells, CD4 effector memory T cells, IFNa+ myeloid dendritic cells (mDC), and several activated T cell populations correlated with mean fold change in antibody titers. However, the responses in these healthy influenza vaccine cohorts likely represent a recall response to antigenically-similar prior exposure as opposed to the primary HBV response in our study. Fourati et al. constructed a naive Bayes classifier based on the top 15 differentially-expressed genes between PBMCs from responders and poor-responders to the HBV vaccines, which included B cell markers (e.g., CD20, IGHG1), downstream targets of B-cell receptor signaling (e.g., BANK1) and molecules known to have functional interactions with IgG (e.g., C1, FCGR3B), with a predictive accuracy of~63% (9). However, the use of bulk transcriptomic analysis of PBMCs may have obscured the contribution of the minor dendritic cell components evaluated in our scRNAseq assays, emphasizing the value of using scRNAseq to assess the contribution of rare cell subsets. Bartholomeus et al. found that the GRN and IFITM1 genes were significantly downregulated in responders while upregulated in non-responders by whole blood transcriptomics analysis, and absolute granulocytes numbers were significantly higher in non-responders at Day 0 prior to vaccination with ENGERIX ® -B (11). However, the role of the dendritic cell component was not evaluated. Our results suggest that the participants in our vaccine study existed in one of two different dendritic cell dispositional states at baselinean NDRG2-mDC2 state in which the vaccine elicited an early antibody response or a CDKN1C-mDC4 state in which the vaccine response may have been actively suppressed. While the possibility that challenge with a foreign antigen (e.g., a vaccine) would be immunosuppressive seems counterintuitive, a healthy immune system has to strike a delicate balance between responsiveness and non-responsiveness under many circumstances. Furthermore, ample data now supports the conclusion that for some vaccines, including ENGERIX ® -B, reducing general immune activation and inflammation may in fact increase the antigen-specific response to the vaccine (9,40).
From a biochemical perspective, foreign antigens are not that different from self-antigens. In order to avoid autoimmunity, the immune system must carefully assess whether an antigen is truly foreign or not. Evidence suggests that the setting in which the naïve immune system experiences antigen may play an important role and is only activated when some type of tissue injury signal accompanies antigen exposure, the so-called "danger hypothesis" (41). In the absence of this danger signal, activation of adaptive immune cells with Signal 1 from antigen without Signal 2 from antigen presenting cell help would be tolerated and not result in activation. And even when the immune system responds to a truly foreign antigen, an overexuberant response (e.g., the cytokine storm) can do more harm than good. Thus, in addition to mechanisms designed to activate an immune response, the immune system has also developed mechanisms for dampening the response, with the regulatory T cell being a classic example. Perhaps the CDKN1C-mDC4 cell is an example of a suppressive type of regulatory dendritic cell. The extent to which these CDKN1C-mDC4s may be similar to the myeloid-derived suppressor cells (MDSCs) observed in some pathological conditions, such as inflammation, chronic infection or cancer (42)(43)(44)(45) remains to be determined.
So, what are the implications of this hypothesis for improving vaccination outcomes? We found that the NDRG2-mDC2/ CDKN1C-mDC4 ratio differed between individuals as well as within the same individual over time. This suggests that perhaps the dendritic cell dispositional state could be modulated to establish an activatable predisposition prior to vaccination. Since adjuvant effects appear to function through the innate immune system, perhaps prior exposure to adjuvant before antigen could establish the appropriate activatable predisposition. While there is some evidence that preconditioning injection sites with TLR agonists can enhance dendritic cell migration (46) and protection against pathogen infection (47) in animal models, to our knowledge this has never been formally assessed in humans (6). This possibility could lead to a more precision-medicine approach for vaccines by determining at baseline who will respond well or not to specific vaccines or who might need just a single dose (48). This approach could readily follow the model of point-of-care testing currently used in infectious disease settings [e.g (49)]. In fact, the field of vaccinology already does assign different vaccines based on individual characteristics, e.g., different flu vaccines are given to different people based on age.
Given the relatively small cohort size of this pilot study, it was not possible to draw definite conclusions about the ability to predict dose 1 vaccine responses. There are many ways to follow up on the findings described here. One approach would be a direct expansion of the study. First, we would need to recruit a larger cohort of patients (100-200) and follow the visitation strategy and HBV serum Ab testing outlined in Ben-Othman (17) and Shannon (13). Pre-vaccination, we would obtain both HPV serum antibody levels and FACS sorted mDC populations for all patients. Next these mDC populations could be profiled by qPCR using probes to the marker genes discovered during this investigation: NDRG2 and CDKN1C. Lastly, response to vaccination would be measured by the antibody titers as performed in this study. This experimental design would now be possible from a cost perspective as profiling many cells from many patients by qPCR is relatively cheap, while using single cell RNAseq on this many patients would be cost prohibitive.
Finally, we described a novel machine learning approach to multi-omics data integration. The results reported by Shannon et al. (13) suggested that because different sources of background noise and technical confounders would contribute to the results from different omics assays, focusing on the consensus information related to outcome using the canonical correlation analysis approach to multi-omics data integration could reduce overfitting and result in more robust and generalizable models (32,34), even in studies where p>>n. However, the number of parameter features available in these systems biology studies makes it impossible to complete an exhaustive search of all available parameter combinations and makes the L1 penalized or LASSO regression implemented in Diablo ineffective at mitigating the effects of noisy features (50). Using single cell RNA sequencing in this study, we were able to identify relatively rare dendritic cell populations whose abundance and activation disposition appear to correlate with vaccine responses. By using this finding to guide feature selection for those genes expressed in these dendritic cell subsets or those CpG sites that are involved in establishing cell type identity, the correlations of the vaccine response predictive models were dramatically improved, demonstrating the value of directed feature selection prior to machine learning model production to further circumvent the p>>n problem.
In conclusion, the machine learning approaches for informative feature selection based on NS-Forest and multi-omics data integration based on supervised canonical correlation analysis not only produced better correlation with vaccine response but also revealed the possible cellular mechanisms responsible. These results suggest that vaccine recipients exist as different dispositional endotypes that dictate their response to vaccination. With a hypothetical mechanism in hand, developing strategies to adjust these dispositional endotypes in preference of dendritic cell-mediated activation rather than suppression could lead to the development of more effective precision vaccination strategies to achieve protective immunity from single vaccine doses, which are of critical importance in resource-limited settings.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by University of British Columbia Children's & Women's Research Ethics Board (Ref: H17-00175). The patients/participants provided their written informed consent to participate in this study.