Single cell transcriptomics reveals cell type specific features of developmentally regulated responses to lipopolysaccharide between birth and 5 years

Background Human perinatal life is characterized by a period of extraordinary change during which newborns encounter abundant environmental stimuli and exposure to potential pathogens. To meet such challenges, the neonatal immune system is equipped with unique functional characteristics that adapt to changing conditions as development progresses across the early years of life, but the molecular characteristics of such adaptations remain poorly understood. The application of single cell genomics to birth cohorts provides an opportunity to investigate changes in gene expression programs elicited downstream of innate immune activation across early life at unprecedented resolution. Methods In this study, we performed single cell RNA-sequencing of mononuclear cells collected from matched birth cord blood and 5-year peripheral blood samples following stimulation (18hrs) with two well-characterized innate stimuli; lipopolysaccharide (LPS) and Polyinosinic:polycytidylic acid (Poly(I:C)). Results We found that the transcriptional response to LPS was constrained at birth and predominantly partitioned into classical proinflammatory gene upregulation primarily by monocytes and Interferon (IFN)-signaling gene upregulation by lymphocytes. Moreover, these responses featured substantial cell-to-cell communication which appeared markedly strengthened between birth and 5 years. In contrast, stimulation with Poly(I:C) induced a robust IFN-signalling response across all cell types identified at birth and 5 years. Analysis of gene regulatory networks revealed IRF1 and STAT1 were key drivers of the LPS-induced IFN-signaling response in lymphocytes with a potential developmental role for IRF7 regulation. Conclusion Additionally, we observed distinct activation trajectory endpoints for monocytes derived from LPS-treated cord and 5-year blood, which was not apparent among Poly(I:C)-induced monocytes. Taken together, our findings provide new insight into the gene regulatory landscape of immune cell function between birth and 5 years and point to regulatory mechanisms relevant to future investigation of infection susceptibility in early life.


Figure S1
. Cell counts by Donor/group and addi onal UMAP plots overlayed with selected sample characteris cs.A) Heatmap showing cell counts for each of the Azimuth annotated cell types stra fied by the donor the cells were generated from.B) Heatmap showing cell counts for each of the Azimuth annotated cell types stra fied by the donor, sample collec on age, and s muli received.C) UMAP plots (same in Figure 1B) stra fied by sample collec on me point (i), s muli/age group (ii), Biological donor (iii), and cell cycle phase (iv).D) Integrated UMAPs overlayed with selected marker genes.

Figure S2.
Repeat UMAP with varying graph construc on parameters.UMAP construc on and plo ng was repeated with different values for the number of nearest neighbors (n.neighbors) and minimum distance (min.dist).Plots with increasing minimum distance are shown from le to right and increasing number of nearest neighbors from top to bo om.The values for minimum distance are 0.1, 0.2, 0.3, 0.4, and 0.5, and the values for number of nearest neighbors are 5, 10, 15, 20, 30, 40, and 50.Each plot is overlayered with the Azimuth reference-based annota on for each cell, with a common legend at the bo om.
Figure S3.A) Volcano plots of selected comparisons from the differen al expression analysis.The plots are arranged to show the results for S muli compared to uns mulated controls for the LPS-s mulated CBMC (first column), LPS-s mulated 5-year PBMC (second column), Poly(I:C)-s mulated CBMC (third column), and Poly(I:C)-s mulated 5-year PBMC (fourth column).Each row corresponds to a different cell type (naïve B cells, naïve CD4 + T cells, naïve CD8 + T cells CD14 + Monocytes, CD16 + Monocytes, and NK cells).Each point represents a gene, and the x-axis shows the average Log2-fold change, and the y-axis shows the -Log10 Bonferronicorrected p value for the corresponding comparison.Genes which recorded a corrected p value < 0.01 and an absolute average log2-FC > 0.25 were considered significantly dysregulated and are shown as red (upregulated) and blue (downregulated) points.The total number of upand down-regulated genes are shown in the inset of each plot, with selected dysregulated genes annotated.B) Volcano plots differen ally expressed genes for CBMC HSPCs s mulated with LPS (i) and Poly(I:C) (ii) compared to controls.Plot parameters are the same as above.C) Upset plots showing the overlap of significantly upregulated (i) and downregulated (ii) genes between CD14 + Monocytes and naïve CD4 + T cells from CBMC and 5yr PBMC samples s mulated with LPS and Poly(I:C).In each plot, the horizontal bar plot shows the total number of up/downregulated genes for that comparison and the ver cal bar plot displays the number of intersect genes for each comparison denoted by the combina on matrix.Intersects between CBMC and 5yr PBMC results for the same cell type/s muli are indicated by a black asterix.

Figure S4
. Results from the sub-analysis of CD14 + monocytes which were randomly downsampled to 1,000 cells per donor.A) Breakdown of the 1,000 cells per donor into sample/s muli groups.B) Volcano plots of differen ally expressed genes compared to uns mulated control for CD14+ monocytes from LPS-treated CBMC (i) and 5yr PBMC (ii) samples, and Poly(I:C)-treated CBMC (iii) and 5yr PBMC (iv) samples.The x-axis shows the average log2 fold change and the y-axis shows the -Log10 Bonferroni-corrected p value.The dashed grey line indicates a Bonferroni-corrected p value of 0.01.Points colored red and blue represent gene with are considered significantly upregulated and downregulated, respec vely.C) Tabulated percentage overlap of the top dysregulated genes (ordered by adjusted-p value) between differen al expression analysis results from all CD14+ monocytes and the randomly selected CD14+ monocyte subset.The columns represent the number of top gene selected and the row show the comparison.As an example, for the comparison of LPS versus control in 5-year PBMC samples (third row), 89.7% of the top 1000 DEGs were common to both the analysis of all monocytes and the analysis of the randomly selected CD14 + monocyte subset.D) Upset plots showing the overlap of the top 100 significantly upregulated genes for CBMC-and 5yr PBMC-derived CD14 + monocytes s mulated with LPS and Poly(I:C).The horizontal bar plot shows the total number of upregulated genes considered (here, 100 top genes) for each comparison.The 100 DEGs iden fied from the analyses of the randomly selected CD14 + monocyte subset are denoted as '1000 monocytes per donor'.The ver cal bar plot displays the number of intersect genes for each comparison indicated in the combina on matrix.The black bar below the combina on matrix indicates prominent intersect sets which involve matching comparisons between the 'all' and '1,000 per donor' CD14 + monocyte analyses.

Figure S5
. Pathways analysis of upregulated genes iden fied by differen al expression analysis for selected cell types.A) Horizontal bar plots of the top 25 significantly enriched pathways from upregulated genes following LPS s mula on of CBMC samples compared to matched control.Results are shown for naïve B cells (i), naïve CD8 + T cells (ii), and NK cells (iii).B) Same as (A) for 5-year PBMC samples treated with LPS compared to match control.Results are shown for naïve B cells (i), naïve CD8 + T cells (ii), and NK cells (iii).C) Horizontal bar plots of the top 25 significantly enriched pathways from upregulated genes following Poly(I:C) s mula on of CBMC samples compared to matched control.Results are shown for naïve CD4 + T cells (i), CD14 + Monocytes (ii), naïve B cells (iii), naïve CD8 + T cells (iv), and NK cells (v).D) Same as (C) for 5-year PBMC sample treated with Poly(I:C) compared to match control.E) Horizontal bar plots of the top 25 significantly enriched pathways from upregulated genes following LPS (i) and Poly(I:C) (ii) s mula on of CBMC samples compared to matched control.Foe reach plot, the x-axis shows the -Log10 adjusted-P value associated with pathways enrichment, the dashed red line indicates an adjusted-p value of 0.001.Results are ordered from top by decreasing adjusted-p value for significantly enriched pathways iden fied from the Reactome, KEGG, and Gene Ontology (GO) databases.BP, Biological Process; MF, Molecular Func on; CC, Cellular Compartment.See supplementary data for complete lists of significantly enriched pathways from dysregulated genes for all cell types analysed.

Figure S6
. Pseudo me trajectory mapping of cellular ac va on trajectories.A-C) UMAP plots represen ng cellular ac va on trajectories for LPS-s mulated CD14 + monocytes (A), Poly(I:C)s mulated CD14 + monocytes (B), and LPS-s mulated CD4 + naïve T cells (C).The first plot (i) in each panel is stra fied by sample group and the second plot (ii) depicts the pseudo me.The branching black line on each plot represents the ac va on trajectory fi ed to the data (monocle3 (33)).D-F) Plots from Monocle3 pseudo me analysis of addi onal cell types for samples s mulated with LPS and uns mulated controls.Plots are shown for CD8 + naïve T cells (D), naïve B cells (E), and NK cells (F), with the first plot (i) in each panel is stra fied by sample group and the second plot (ii) depicts the pseudo me.The branching black line on each plot represents the ac va on trajectory.F) UMAP plots from Monocle3 analysis of Poly(I:C)s mulated CD4 + naïve T cells (i), CD8 + naïve T cells (ii), naïve B cells (iii), and NK cells (iv).
Figure S7.Gene Regulatory Network characteris cs for selected LPS-induced cell types.Plots show the correla on between the metrics for transcrip on factors and target genes for samples collected from cord blood (CBMC) (x-axis) and 5yr blood (PBMC) (y-axis).Plot are included for naïve B cells (A), naïve CD4 + T cells (B), naïve CD8 + T cells (C), NK cells (D), and CD14 + monocytes (E), and show the betweenness centrality (le ), degree centrality (middle), and eigenvector centrality (right).In each case, a higher value indicates a greater importance within the network and a devia on off the diagonal indicates the metric has a higher value in corresponding sample.Plots show the correla on between the metrics for transcrip on factors and target genes for samples collected from cord blood (CBMC) (x-axis) and 5yr blood (PBMC) (y-axis).Plots are included for naïve B cells (A), naïve CD4 + T cells (B), naïve CD8 + T cells (C), NK cells (D), and CD14 + monocytes (E), and show the betweenness centrality (le ), degree centrality (middle), and eigenvector centrality (right).Plot and metric characteris cs are the same as above (Figure S6).

Study subjects
The study was designed to assess matched birth (CBMC) and 5 years (PBMC) blood samples following LPS and Poly(I:C) treatment, along with matched untreated controls, from two donors (one male, one female).The samples were curated from the Childhood Asthma Study (CAS) cohort, a prospec ve birth cohort for high risk of asthma development (1)(2)(3)(4)(5).Cord blood samples were collected from healthy, full-term, singleton births.Matched 5-year samples were collected from the same donor by home visit close to their 5th birthday.This study has ethics approval by The University of Western Australia (reference RA/4/1/7560), and fully informed parental consent was obtained for sample collected from each subject.
In  (6).Each cryopreserved vial was processed on a different day, so that matched s muli/control samples were processed together.Following culture, replicate culture plate wells (4 per sample/condi on) were gently resuspended and transferred to a sterile 1.5ml LoBind tube (Eppendorf).Samples were pelleted (centrifuga on at 500 x g for 7 minutes at 4oC) and re-suspended in PBS + 0.04% BSA (UltraPure; Thermo Fisher Scien fic) (4oC) to a target concentra on of 2,000 cells/μl.Postculture viability is recorded in Table S1.Samples were immediately transferred on ice to Genomics WA (Perth, Western Australia) for library prepara on and sequencing.

Library prepara on and sequencing
Single cells were processed on Chromium using the Chromium Next GEM Single Cell 3' Kit v3.1 (4 reac ons, PN-1000269, 10X Genomics) on Chip G (PN-1000127, 10x Genomics) according to the manufacturer's protocol with targeted recovery of 5,000 cells per channel.Libraries were sequenced on the NovaSeq 6000 pla orm in a single batch.

Alignment and ini al quality control
Raw fastq.qzfiles were processed with the CellRanger Toolkit (Version 6.1.1,10x Genomics) with the Human GRCh38 genome assembly (refdata-gex-GRCh38-2020-A) was used as the reference genome.The CellRanger count pipeline was run with default parameters.
CellRanger web_summary outputs were assessed, and no alerts (warnings or errors) were recorded for any sample.Selected CellRanger outputs are recorded in Table S1; briefly, this project generated (on average) 5,527.17cells per sample with 63,098.75mean reads per cell and an average of 1980.17genes detected per cell, as es mated by CellRanger.The raw feature matrix, and corresponding barcodes and features, were used for downstream QC and analysis.
Sample pre-processing and quality control Count matrices and corresponding barcode and feature files were imported into the R sta s cal environment (version 3.6.2),and all subsequent QC/analyses were conducted in R, unless otherwise stated.In general, each sample was run through a QC pipeline which combines func ons from several R packages, of which Version 3.2.0 of Seurat (7) was used most extensively for QC, detailed below.The barcodeRanks func on from the DropU ls package to compute barcode rank sta s cs and visualize the knee and inflec on points, and the emptyDrops func on (DropU ls) was used to iden fy and retain droplets which puta vely contain cells.Droplets were considered cell-containing if their associated barcodes significantly deviate from the ambient profile from 10,000 Monte Carlo itera ons (False Discovery Rate < 0.01).
For an ini al filter, genes were retained if they were expressed in at least 1% in cells (addi onal gene filters were implemented for individual methods, below).Next, the quality to individual cells were evaluated and removed based on several metric.Mitochondrial genes were iden fied by gene symbols with a prefix of "MT-" and ribosomal genes were iden fied with a "RPS" or "RPL" prefix.Quality control metrics were calculated for each cell with the addPerCellQC func on from the scu le package.Cells were excluded if they had less than 500 unique gene detected and/or less that 1000 total transcripts.Cells were also excluded if they had a mitochondrial gene content greater than three median absolute devia ons (MADs) above the median.This adap ve threshold was chosen to capture the mitochondrial content distribu on of each sample independently, rather than applying a constant threshold (e.g., 5-10%) across all samples.Addi onally, a threshold of 50% ribosomal content was used to exclude cells with a majority of their gene expression accounted for by ribosomal genes (which excluded few cells).Next, The standard Seurat pipeline was applied to explore and asses each sample.Gene expression was log normalized with the NormalizeData func on (scale.factor= 10000), and the top 2,000 most variable features were iden fied with the FindVariableFeatures func on (selec on.method= "vst").The data was scaled and centered with the ScaleData func on, and used for an ini al principal component dimensionality reduc on was applied with the RunPCA func on (npcs = 50) using the top 2,000 variable genes.The VizDimLoadings and DimPlot func ons were used to assess the loading contribu on of genes and plot the first two PCs, respec vely.Addi onally, the DimHeatmap func on (cells = 500, balanced = TRUE) was used to visualize the first 9 PCs.To assess whether the each sample displayed discrete clusters from their gene expression profiles, the FindNeighbors (reduc on = "pca", dims = 1:15, k.param = 20, n.trees = 50, annoy.metric= "euclidean") and FindClusters (resolu on = 0.2 n.start = 10, n.iter = 10) func ons were used to find the 20 nearest neighbors and apply a nearest neighbor modularity op miza on clustering algorithm to iden fy dis nct clusters, respec vely.Next, the Uniform Manifold Approxima on and Projec on (UMAP) approach was applied to reduce the muldimensional data (previously calculated PCs) to 2-Dimensional coordinates for the purpose of visualiza on using the RunUMAP func on (reduc on = "pca", umap.method= "uwot", n.neighbors = 30, n.components = 2, metric = "cosine", learning.rate= 1, min.dist= 0.3, spread = 1, local.connecvity = 1, repulsion.strength= 1, nega ve.sample.rate= 5, dens.lambda= 2, dens.frac= 0.3, dens.var.shi= 0.1).The resul ng UMAP representa on of the data were plo ed with the DimPlot func on with individual cell colored according to which cluster they were assigned.Addi onally, the DoHeatmap func on was used to plot the 10 genes per cluster.From the above cluster analysis, each sample exhibited discrete clustering as expected from gene expression profiles of mix cell popula ons.The CellCycleScoring func on was used to es mate the cell cycle phase of each cell for each sample using the reference cell cycle marker genes loaded with the Seurat package.Doublets were detected and removed with DoubletFinder(8) package.First, the paramSweep_v3 (PCs = 1:15, sct = FALSE) and summarizeSweep (GT = FALSE) func ons to perform an ar ficial doublets (pN) and neighborhood sizes (pK) parameter sweep, summarized the outputs, and compute the bimodality coefficient for each pK values.The op mum pK values was assigned to the value with the highest corresponding bimodality coefficient.The doubletFinder_v3 func on (PCs = 1:15, pN = 0.25, pK = op mum_pK_value, reuse.pANN= FALSE, sct = FALSE) was run with an assump on of a 5% doublet rate from all cells, and cells assigned as a doublet were excluded.
To assess whether expected cell types were present in each of the CBMC/PBMC samples, gene expression of selected marker genes (CD3E, CD4, CD8A, NKG7, GNLY, CD14, LYZ, CD19, CD79A, and CD34) was plo ed with the DimPlot func on.Addi onally, unbiased reference-based annota on of each cell was applied with singleR (9).The high-level annota on were used from both the Human Primary Cell Atlas and Blueprint Encode references provided.From this assessment, each sample displayed dis nct clusters which could be a ributed to major cell types expected in the circula ng blood (CD4 + T cells, CD8 + T cells, B cells, NK cells and Monocytes).All sample were assesses as good quality and the pre-processing and quality control metrics are recorded in Table S1.Raw counts data (post CellRanger) are available via the Gene Expression Omnibus (GSE232186).

Integra on, Annota and Dimensionality reduc on
The integra on method from the open-source R toolkit Seurat (7) was used to integrate individual samples following quality control.Pre-processed samples were converted to a list object and the raw counts from each sample was log normalised with the NormalizeData func on (scale.factor= 10000, margin = 1).The top 2,000 variable features were re-calculated with the FindVariableFeatures (selec on.method= "vst") and the ScaleData func on was used to scale and center the log normalised data.The first 50 principal components were calculated for each samples with the RunPCA func on using the top 2,000 variable features.The FindIntegra onAnchors func on (anchor.features= 2000, scale = TRUE, normaliza on.method= "LogNormalize", reduc on = "rpca", l2.norm = TRUE, dims = 1:30, = 5, k.filter = 200, k.score = 30, max.features = 200, nn.method = "annoy", n.trees = 50, eps = 0) was used to iden fy 2,000 'anchors'.Anchors are cell-cell pairs with a correspondent similarity between two data sets that are expected to exhibit a common set of molecular features (i.e., cells with near iden cal gene expression profiles in different data sets).The process of iden fying anchors is described in depth in the original publica on(7); Briefly, data set dimensionality is reduced with diagonalized Canonical Correla on Analysis and L2 normaliza on (scaling so that the sum of the squares sums to 1) is applied to the output vectors.Anchors are then iden fied as mutual nearest-neighbors in the reduced dimensional representa on.These anchors were used to integrate the data with the IntegrateData func on (n ormaliza on.method= "LogNormalize", dims = 1:30, k.weight = 100, weight.reducon = NULL, sd.weight = 1, eps = 0).For this integra on, anchors are scored between reference and query data sets, and the difference in expression profiles of the anchor cells are used to apply a weighted average correc on transforma on (7).Individual cells were annotated with Azimuth(10), a web-based applica on which provides reference-based mapping to unbiasedly annotate scRNA-Seq profiles.The annota on employed vai Azimuth is detailed in the original publica on(10) (in par cular, see 'Reference-based Integra on for query datasets' in the Methods sec on).Briefly, the counts data (query data set) was transformed to the same low-dimensional space (supervised PCA) as a reference data set, and the cell type labels from the reference data set are transferred to corresponding cells of the query data set, along with a predic on/confidence score.For this study, the human PBMC reference data set was used as the reference; a CITE-seq dataset with gene expression for hundreds of thousands of cells alongside a large panel of an bodies which was used to accurately phenotype cell types present in the PBMC from cell surface markers.The level 2 cell type resolu on was used, and cells were excluded if they had a predic on/confidence score less than 0.5.Following integra on and annota on, the pheatmap func on was used to create a heatmap to display the number of cells detected for each cell types, by various stra fica ons (e.g., Age/s muli, Donor, etc).Uniform Manifold Approxima on and Projec on (UMAP) was employed for dimensionality reduc on.First, the ScaleData func on was used to scale and center the integrated data set, and the first 30 principal components were calculated with the RunPCA func on ().Although non-linear dimensionality reduc on methods such as UMAP are widely used to visualize rela onships between cells within high dimensional data in 2-Dimensional space (e.g., cell type clusters in scRNA-Seq data sets), these methods o en do so at the cost of preserva on of the local and/or global structure of the data (11).We wanted to tested whether the plots generated from our UMAP analysis were robust to different values of two of the most influen al parameters for UMAP embedding; the number of nearest neighbors (n.neighbors) and the minimum distance (min.dist) between cells.To do this, the RunUMAP func on was used to generate 2-dimensional representa ons of the data with combina ons of nearest neighbour values of 5, 10, 15, 20, 30, 40, 50 and minimum distance values of 0.1, 0.2, 0.3, 0.4, and 0.5, and other parameter set as follows; map.method = "uwot", n.components = 2, metric = "cosine", n.epochs = NULL, learning.rate= 1, spread = 1, set.op.mix.rao = 1, vity = 1, repulsion.strength= 1, nega ve.sample.rate= 5.The nearest neighbour and minimum distance values were chosen as they span the suggested range for these parameters.The DimPlot func on was used to plot the first two UMAP components of each UMAP construc on and the plot with nearest neighbour value of 20 and a minimum distance values of 0.3 was selected as a representa ve for visualisa on purposes to display the integrated cell type clustering overlayered with Azimuth-defined annota ons, as well as relevant experimental group variables and marker gene expression intensi es.

Differen al gene expression and Pathways analysis
To iden fy differen ally expressed genes (DEGs) between LPS/Poly(I:C) and corresponding uns mulated control samples for each cell type, we employed Model-based Analysis of Singlecell Transcriptomics (MAST) (12) via the FindMarkers func on from Seurat.MAST employs a two-part (hurdle) generalized linear model tailored to address the bimodal, zero-inflated gene expression distribu ons encountered in scRNA-seq data.The cellular detec on rate (CDR, the frac on of genes expressed in each cell), mitochondrial gene propor on, and cell cycle phase were included in as covariates for MAST analysis.As each donor represents a different biological sex (male, female), this variable was also included as a latent variable.This approach was selected to accommodate our small sample size (two biological donors), although we acknowledge that incorpora ng individual par cipant varia on as a latent variable is a subop approach compared to other methods suited to larger sample sizes (viz.pseudobulk and mixed models with a random effect for individual) (13,14), and this is a limita on of our study.However, our analysis iden fied many co-regulated genes which would be expected to be dysregulated following treatment with LPS and Poly(I:C) (corroborated by pathways enrichment analysis and independently iden fied with GRN analysis) and the primary genes of interest recorded extremely small Bonferroni-corrected p values (commonly < 10x10 -20 ).For these reasons, we believe the loss in precision of the MAST + latent variable method had limited impact on the findings in our study.Genes were considered differen ally expressed if they recorded a Bonferroni-corrected p value < 0.01 and an average Log2 fold-change in expression of > 0.25 (upregulated) or < -0.25 (downregulated).Aligned volcano plots displaying DEGs iden fied from mul ple cell type simultaneously were plo ed in base R and DEG count heatmaps were plo ed with the pheatmap func on in R. The overlap of differen ally expressed genes were plo ed with the upset func ons from the UpSetR package.Func onal biological pathways associated with significant DEGs between cell type/s muli groups were iden fied with the enrichment analysis tool enrichR (15)(16)(17) in R. EnrichR calculates enrichment of an input gene set within the annotate gene sets of func onal pathways (curated from >200 databases) with Fisher exact test (and varia ons thereof (15)), and provides many addi onal analysis and plo ng tools.Significant DEG sets (up-and downregulated genes separately) were used to query biologically-relevant annotated gene sets for significant enrichment from the Reactome(18), KEGG (19), and Gene Ontology(20) databases.These databases were selected as they are well established and offer a wide range of biological func ons, including those related to the innate immune response.Significantly enriched pathways from all databases were ordered by decreasing -Log10 p values and plo ed as a bar plot with base R.

Pseudo me trajectory inference
We applied Monocle3(21) to infer s muli-related ac va on trajectories from transi onal cellular states present in the data.For each analysis, only the raw counts from cells relevant for that comparison (e.g., CBMC/5yr PBMC untreated and LPS-treated monocytes) were included.The count data was pre-processed with the preprocess_cds func on (method = "PCA", num_dim = 50, norm_method = "log") and the cells were aligned with the align_cds func on (preprocess_method = "PCA", alignment_k = 20) with the donor variable used as the alignment group and the cellular detec on rate and mitochondrial content included as terms in the model formula.Monocle3 require the mul dimensional gene expression data to be projected to a lower dimensional representa on to learn and fit puta ve trajectories of an underlying biological process.For this purpose, UMAP dimensionality reduc ons were generated with the reduce_dimension func on with 10 nearest neighbours and a minimum distance of 0.1, and cells were clustered with a k value (nearest neighbours) of 10 and a resolu on value of 0.001 for all comparisons, with the excep on of cluster resolu on values and for LPS-and Poly(I:C)-induced naïve CD4+ T cells comparisons, respec vely, to accommodate larger cell numbers.Other parameters for this func on were max_components = 2, reduc on_method = "UMAP", umap.metric= "cosine", umap.fast_sgd= FALSE, umap.nn_method= "annoy".Unsupervised clustering was performed with the cluster_cells func on (reduc on_method = "UMAP", k = 10, cluster_method = "leiden", num_iter = 2, par on_qval = 0.01, weight = FALSE, resolu on = 0.001), which uses the Leiden (or Louvain, if selected) community detec on method.A principle graph was fi ed to data with the learn_graph func on (use_par on = TRUE, close_loop = FALSE) and cells were assigned a pseudo me value based on their projec on on the principal graph with the order_cells func on (reduc on_method = "UMAP", root_pr_nodes = NULL, root_cells = NULL).Regions enriched with uns mulated controls were selected as pseudo me start points so that trajectories extended into s muli-ac vated regions.The reduced dimension representa on displaying experimental groups and the assigned pseudo me values were plo ed with the plot_cells func on.
Gene Regulatory Network (GRN) analysis and in silico perturba ons We employed CellOracle (22) to build GRNs in order to iden fy the key molecular drivers (transcrip on factors (TF)) and their corresponding target genes for selected cell type/s muli groupings.For this analysis, SCANPY(23) (version 1.9.3) was used for pre-processing and force directed graph construc on (Par on-based graph abstrac on (PAGA) (24)) and CellOracle 0.12.0) was run with Python (3.10.6) on Ubuntu 22.04.1 via Windows Subsystem for Linux 2 kernel.Genes with at least 1 count were retained and then genes were filters to iden fy the top 3000 most variable for each comparison.The data was normalized (scanpy.pp.normalize_per_cell), log transformed (scanpy.pp.log1p) and scaled (scanpy.pp.scale) with default parameters.Addi onally, the donor variable was adjusted for in the data with scanpy.pp.regress_out.Separate analyses were run from raw counts for each cell type/s muli comparison and group specific GRNs (e.g., Cord_Control, Cord_LPS, 5yr_Control, and 5yr_LPS) were constructed from the Human promoter base GRN provided.CellOracle was run with standard parameters and the monocle3-defined pseudo me values for each cell were included for analysis.From the output of CellOracle, we plo ed Venn diagrams (ggvenn R package) to show the overlap of significantly connected (p value < 0.01) target genes (TG) of IRF1, IRF7, STAT1, and STAT2 for each comparison, and displayed TF-TG wiring diagrams of the top 100 TG connec ons with the igraph R package.CellOracle constructed GRNs were then used to perform in silico transcrip on factor perturba ons of IRF1, IRF7, and STAT1 to simulate the changes in cellular states a er nullifying the regulatory

Figure S8 .
Figure S8.Gene Regulatory Network characteris cs for selected Poly(I:C)-induced cell types.Plots show the correla on between the metrics for transcrip on factors and target genes for samples collected from cord blood (CBMC) (x-axis) and 5yr blood (PBMC) (y-axis).Plots are included for naïve B cells (A), naïve CD4 + T cells (B), naïve CD8 + T cells (C), NK cells (D), and CD14 + monocytes (E), and show the betweenness centrality (le ), degree centrality (middle), and eigenvector centrality (right).Plot and metric characteris cs are the same as above (FigureS6).

Figure S9 .
Figure S9.In silico perturba on to nullify transcrip on factor ac vity (CellOracle).A) Forcedirected graph of naïve CD4 + T cells stra fied by s muli/age group.This plot recapitulates the characteris cs observed from the UMAP (Figure3B(i)).B) i) Monocle-defined pseudo me showing differen a on into CBMC-related and 5yr PBMC-related ac va on states and ii) differen a on/ac va on vectors projected onto the force-directed graph (PAGA).C) Knockout (KO) simula on vector field with perturba on scores for in silico KO of IRF1 (le ), STAT1 (center), and IRF7 (right).KO of STAT1 and IRF1 results in remarkably similar effect on cellular iden ty.Colors correspond to the perturba on score and denote whether the KO would puta vely block (red) or promote (green) ac va on on that region of the vector field.Arrows indicate the direc onal change in ac va on following simulated KO and are calculated as the inner product of the ac va on trajectory (pseudo me) and the simulated perturba on score.The size of the arrow represents the magnitude of the inner product.