Polygenic risk associated with Alzheimer’s disease and other traits influences genes involved in T cell signaling and activation

Introduction T cells, known for their ability to respond to an enormous variety of pathogens and other insults, are increasingly recognized as important mediators of pathology in neurodegeneration and other diseases. T cell gene expression phenotypes can be regulated by disease-associated genetic variants. Many complex diseases are better represented by polygenic risk than by individual variants. Methods We first compute a polygenic risk score (PRS) for Alzheimer’s disease (AD) using genomic sequencing data from a cohort of Alzheimer’s disease (AD) patients and age-matched controls, and validate the AD PRS against clinical metrics in our cohort. We then calculate the PRS for several autoimmune disease, neurological disorder, and immune function traits, and correlate these PRSs with T cell gene expression data from our cohort. We compare PRS-associated genes across traits and four T cell subtypes. Results Several genes and biological pathways associated with the PRS for these traits relate to key T cell functions. The PRS-associated gene signature generally correlates positively for traits within a particular category (autoimmune disease, neurological disease, immune function) with the exception of stroke. The trait-associated gene expression signature for autoimmune disease traits was polarized towards CD4+ T cell subtypes. Discussion Our findings show that polygenic risk for complex disease and immune function traits can have varying effects on T cell gene expression trends. Several PRS-associated genes are potential candidates for therapeutic modulation in T cells, and could be tested in in vitro applications using cells from patients bearing high or low polygenic risk for AD or other conditions.


Introduction
Alzheimer's disease (AD) is a chronic neurodegenerative condition that afflicts millions of Americans.While AD pathology has long been known to include aberrant aggregation of amyloid beta peptide and tau protein, it is also linked to inflammation and other immune processes.T cells comprise part of the adaptive immune response to pathogens and other biological insults.Recently, T cells in AD patients have shown a high degree of clonal expansion, a less diverse T cell receptor repertoire, increased infiltration into the cerebrospinal fluid (CSF) and brain parenchyma, and upregulation of genes involved in cytotoxicity, inflammation, immunosenescence, and response to certain chemokines (1)(2)(3).Researchers have previously detailed how T cell gene expression changes are correlated with individual genetic variants, some of which are associated with AD and other diseases (4)(5)(6)(7)(8)(9)(10)(11).Several of these studies also profile T cells at various stages of differentiation or activation using flow-assisted cell sorting or single-cell RNA-sequencing, and show that many genotypedependent gene expression changes are specific to particular T cell subtypes or activation states.However, aggregating the effects of many genetic variants may better capture genotype-phenotype correlation in complex disease, especially if these variants have been previously linked to disease risk.Thus, we now focus on correlating polygenic risk scores (PRSs) for AD and other conditions with T cell gene expression data, building on research using PRSs to better understand AD and related phenotypes.
While novel approaches to PRS studies are accelerating, few studies have correlated PRSs with gene expression at this stage (12)(13)(14), and no studies, to our knowledge, have correlated a PRS for any disease with T cell gene expression.Because our patient cohort consists of AD patients and age-matched healthy controls from the Religious Orders Study and Memory and Aging Project (15), we first calculate a PRS for AD, validating it against diagnostic data and neuropathological measurements.We then correlate gene expression data from four T cell subtypes with the PRS for AD, and with PRSs for 13 other immune cell, autoimmune disease, and neurological disease traits.We differentiated these T cell subtypes between CD4+ vs. CD8+ and naïve vs. memory populations, to better capture the reality of cell type-specific or state-specific effects of genetic variants on gene expression that has been shown in other studies (6,(8)(9)(10)(11).
We aim to understand the differential gene expression in T cells at different polygenic risk levels for AD and other disorders.We hypothesize that disease-relevant genes and pathways will be differentially expressed with respect to polygenic risk for disease.We expect that our dataset, involving four T cell subtypes, will highlight differences in PRS-associated genes across T cell subtypes and disease traits.Our findings highlight biological pathways and other mechanisms of polygenic risk for disease, which can aid in hypothesis generation for future targeted studies of T cell behavior in AD and other conditions.They also provide interesting comparisons to previous genotype-phenotype correlation studies using T cell RNA-sequencing data (4)(5)(6)(7)(8)(9)(10)(11).
2 Materials and methods

Study participants
Study participants come from the Religious Orders Study (ROS) and Memory and Aging Project (MAP), described in detail elsewhere (15).Briefly, ROS enrolls Catholic priests, nuns, and brothers, without known dementia, aged 53 or older from more than 40 groups in 15 states across the USA.MAP enrolls men and women without known dementia aged 55 or older from northeastern Illinois.Peripheral blood mononuclear cells (PBMC) from 96 ROSMAP participants were used in this study.48 participants were clinically and/or pathologically diagnosed with AD, while 48 participants without dementia served as controls.Brain tissue from each participant was analyzed post-mortem to detect pathological signs of neuritic plaques and neurofibrillary tangles.

Sample preparation, RNA-sequencing, and genotyping
PBMCs were isolated by Ficoll gradient centrifugation, then sorted by high-speed flow cytometry into the following T cell subtypes: CD4+CD45RO-, CD4+CD45RO+, CD8+CD45RO-, and CD8+CD45RO+.Because CD4+ T cell subtypes such as Th1, Th2, Th17, and regulatory T cells are best distinguished by intracellular markers that cannot be detected prior to fixation and permeabilization, and to ensure sufficient post-sorting cell counts for RNA extraction, we chose to forgo isolation of CD4+ T cell subtypes beyond the presence or absence of CD45RO.Total RNA was extracted using buffer TCL (Qiagen), then RNA-seq libraries were prepared according to the Single Cell RNA Barcoding and Sequencing method originally developed for single-cell RNA-seq (16), adapted for extracted total RNA.RNA libraries were collected on a single 384-well plate and sequenced on the Illumina HiSeq using the High-throughput 3' Digital Gene Expression (DGE) library (16).Genes with maximum count value of at least three and non-zero values in over twenty percent of samples were included in differential expression analysis.Expression values were normalized to counts per million (CPM).The EdgeR package in R (17) was used to conduct differential expression, and Voom transformation (18) was applied to gene expression data.DNA for genotyping was extracted from whole blood or frozen post-mortem brain tissue and genotyped using the Affymetrix GeneChip 6.0 platform.Quality control of genotyping data was done with PLINK (19) (http://pngu.mgh.harvard.edu/~purcell/plink/), and imputation was done with MACH software (version 1.0.16a).

Polygenic risk score calculation
Summary statistics files from genome-wide association studies were used as the base data.Duplicate SNPs were removed from base data files, as were ambiguous SNPs for which the effect allele and other allele were complementary nucleotides (C with G or A with T), using PLINK (19).For summary statistics whose coordinates were found on genome build 38, LiftOver (http://genome.ucsc.edu)was used to convert these coordinates to genome build 37, to match the target data.For traits with missing odds ratio values in the summary statistics, these were calculated from the beta values by using the exp() function in R. For traits with missing beta values, beta values were estimated using the sample size, Z-score, and allele frequencies.The estimated beta values were then converted to odds ratio values using exp().
Genomic data from ROSMAP participants was used as the target data.These data were stored as three separate batches (ROSMAP_n1686, ROSMAP_n381, and ROSMAP_BU) due to separate genotyping batches, and initially processed individually by cohort.Quality control of target data was done with R and PLINK (19).First, SNPs were filtered to exclude those with a minor allele frequency (MAF) less than 0.01, a Hardy-Weinberg Equilibrium test p-value under 1 x 10 -6 , and SNPs missing in at least 1% of participants.Individuals missing over 1% of SNPs in their genotyping data were also excluded at this stage.We then pruned highly correlated SNPs using a window size of 200 variants, a step size of 50 variants at a time, and filtered out any SNPs with an LD r 2 value above 0.25.Participants with heterozygosity F coefficients greater than three standard deviations from the mean, with differences between reported sex and sex chromosomes, or with a first or second degree relative in the sample, were excluded.
After these quality control steps on individual batches, we used PLINK to merge ROSMAP_n1686, ROSMAP_n381, and 15 participants from ROSMAP_BU.We limited the inclusion of participants from ROSMAP_BU to participants with T cell gene expression data whose AD PRS was not an outlier in the overall distribution.This was because of the high number of unique SNPs in the ROSMAP_BU cohort relative to ROSMAP_n1686 and ROSMAP_n381, which we suspect is due to low-quality imputation of rare variants.In the merged dataset, we excluded SNPs with MAF < 0.01 and SNPs missing in at least 5% of participants.The merged dataset was then used as target data in further PRS calculation.
We used PRSice-2 (20) for PRS calculation.PRSice-2 uses the standard C+T approach, meaning that PRSice-2 first performs clumping of input variants using parameters of a 250 kb window, p-value threshold of 1, and r2 value threshold of 0.1.The user also inputs one or more p-value thresholds, such that the software will only include SNPs with a p-value below the threshold in PRS calculation.The software also automatically performs strandflipping for SNPs whose alleles mismatch between base and target data.The software then adds the effects of individual SNPs as weighted by odds ratio for PRS calculation.We used age and sex as covariates.For the AD PRS, clinical AD diagnosis was used as the input phenotype, and SNPs within 1 Mb of the APOE locus were excluded as input in PRS calculation.Including genotyping batch as an additional covariate did not change the calculated PRSs.We input a range of p-value thresholds for SNP inclusion from 5 x 10 -8 to 1, yielding a set of PRS scores as output for each individual.

Validation of AD PRS against clinical and pathological data
The pROC package (21) was used to calculate receiver operator characteristic (ROC) curves for the AD PRS against clinical AD diagnosis and against pathological AD diagnosis, at each SNP pvalue threshold.We also used this package to calculate the predictive value of the AD PRS from the area under the ROC curves, to determine which p-value threshold yielded the highest predictive value.We compared the distribution of AD PRS scores at this p-value threshold between AD and non-AD participants using student's T test.We also ran comparisons of the AD PRS at this pvalue threshold against Braak score using nonparametric one-way ANOVA, and quantitative pathological measurements (amyloid plaque burden, tau tangle burden, and global pathology measurement) using linear regression with age, sex, clinical AD status, pathological AD status, and the first ten principal components from genotyping data as covariates.

Detection and pathway analysis of PRSassociated genes
Genotyping data and T cell RNA-sequencing data were available for 78 participants, with one participant excluded whose AD PRS was an outlier.For each trait and each T cell subtype, we ran linear regression of gene expression counts against PRS scores to detect PRS-associated genes using the lm() function in R. For covariates, we used age, sex, AD diagnosis (clinical and pathological), and the first 10 principal components from genotyping, with no interactions between covariates.Genes expressed in fewer than 20% of participants, or with a maximum expression count under 3, were excluded from PRS association analyses.Because non-AD traits did not have relevant phenotypic data for PRS validation, we used the same p-value threshold (p = 1) as the optimum PRS for all traits.For pathway analysis of PRSassociated genes, we used Gene Set Enrichment Analysis (GSEA ( 22)) separately for each T cell subtype and each trait.GSEA was run using the t-value from the PRS-gene association as the ranking metric, using the default value of 1,000 permutations and restricting the gene sets to those with 15-200 genes, using the GO Biological Processes 2022 database (23).We then detected biological pathways which were significantly over-represented or under-represented at a significant level of 0.05, after correcting for false discovery rate.Figures were generated using the ComplexHeatmap (24), cowplot, and ggplot2 (25) packages in R.
Genes associated with the PRS for AD were compared to a published single-cell RNA-sequencing dataset of peripheral blood from AD patients and healthy controls (26).We downloaded this dataset from the Gene Expression Omnibus (accession number GSE181279) and processed the data using the Seurat package in R (27).We first conducted a standard quality control and data normalization and transformation workflow for each individual in the dataset, then integrated the Seurat objects using the top 5000 highly variable genes as integration features.In the integrated object, we removed cells with over 15% mitochondrial genes, cells containing under 200 or over 3000 genes, and cells containing over 10,000 unique molecular identifiers, leaving 36,209 cells.We then reran quality control, normalization, transformation, and dimensionality reduction with principal component analysis, and clustered the cells with a resolution of 0.8.We identified clusters with high expression of CD3E and re-clustered them as T cells, rerunning the same quality control steps as in the integrated object, leaving 26,515 T cells.We then examined expression of canonical T cell subtype markers in each cluster to define clusters of CD4+ naïve, CD4+ memory, CD8+ naïve, and CD8+ memory T cells, corresponding to the four T cell subtypes in our PRS dataset.We used the FindMarkers function to detect differentially expressed genes between the AD subjects and healthy controls for the four T cell subtypes.

Calculation and validation of a polygenic risk score for AD
We first used PRSice-2 (20) to calculate a genome-wide polygenic risk score for AD, for 2051 individuals in ROSMAP.We used the summary statistics from the Kunkle et al., 2019 (28) genome-wide association study (GWAS) for SNP effect sizes excluding SNPs in the APOE locus (see Methods).PRSs were calculated using SNPs associated with AD at p-value thresholds ranging from 5 x 10 -8 to 1 (all SNPs), and PRS distributions at each threshold were standardized to have a mean of 0 and standard deviation of 1.For individuals with clinical or pathological AD diagnostic data, we computed the predictive value of the AD PRS for each p-value threshold (see Supplementary Figure 1).Using SNPs at a p-value threshold of 0.75 or 1 resulted in the highest predictive value of the PRS.We also found that the AD PRS correlated significantly with other measures of AD neuropathology, including Braak staging of tau pathology, amyloid burden, tau tangle burden, and global pathology.

Correlation of gene expression with PRSs for AD and other traits
Prior to correlating PRS with gene expression, we hypothesized that individuals in our cohort could possess genetic risk variants for other neurodegenerative disorders as well.Because other neurological disorders, such as epilepsy and stroke, involve acute damage to the blood-brain barrier with potential infiltration of peripheral immune cells into the CNS (29,30), polygenic risk for these conditions could be linked to T cell-mediated neuroinflammation.We also hypothesized that T cell gene expression trends could be affected by polygenic risk for traits related to immune function or autoimmune disease, even without clinical presentation of these conditions in our cohort.Thus, we computed PRSs for common traits or diseases in these categories, including lymphocyte counts (31), white blood cell counts (31), C-reactive protein levels (31), ulcerative colitis (32), Crohn's disease (32), multiple sclerosis (33), rheumatoid arthritis (34), systemic lupus erythematosus (35), type 1 diabetes (36), Parkinson's disease (37), amyotrophic lateral sclerosis (38), epilepsy (39), and stroke (40).The GWAS studies used as base data in PRS calculation did not distinguish between phenotypic subtypes in disease cases, with the exception of rheumatoid arthritis, which additionally analyzed variants in seropositive patients after the case-control comparison.
78 participants with PRSs calculated from genotyping array data had T cell RNA-sequencing data from blood samples.One of these participants was excluded because their AD PRS was an outlier at the PRS distribution for p = 1, leaving 77 participants for correlation between PRSs and gene expression.Demographics for these participants, including AD diagnostic information, are given in Table 1.Each participant had bulk RNA-sequencing data from the following T cell subtypes, sorted using flow cytometry: CD4+CD45RO-, CD4+CD45RO+, CD8+CD45RO-, and CD8+CD45RO+.The CD45RO marker was used to separate putatively naïve from memory T cell populations to distinguish PRS-associated genes by T cell differentiation state, although a subset of memory T cells are now known to lose CD45RO expression and re-express CD45RA (41).In these participants, we computed the PRSs using all independent SNPs in the genome (p-value threshold of 1), since we did not have phenotypic data for PRS optimization, and the optimal PRS for AD was derived from the higher p-value thresholds.We then correlated the PRSs with T cell gene expression (see Methods for quality control measures used at this stage).46 of our sequencing participants also have brain RNA-sequencing data from a recent publication (13), although we focus only on T cell gene expression here.A listing of all PRS-associated genes by trait and T cell subtypes is found in Supplementary Table 1.
Genes with a nominally significant relationship with the PRS are shown on the heatmap in Figure 1.Overall, 5961 genes (of 6139 genes passing minimum expression thresholds) were associated with the PRS for at least one trait in at least one cell type.We compared genes associated with the PRS for AD to a published single-cell RNA-sequencing dataset of peripheral blood mononuclear cells in AD patients and healthy controls (26).After identifying cell clusters from the single-cell data analogous to the four T cell subtypes in our dataset (see Supplementary Figure 2A), we found genes in these clusters with differential expression in AD patients versus controls.Of the genes found in both datasets that were significantly associated with the AD PRS in our cohort, 4-33% of them were upregulated with high AD PRS in our dataset and with AD status in the single-cell dataset, depending on the T cell subtype.These genes include MAPK1 in CD8+CD45RO+, which represses interferon signaling and is a key mediator of signaling pathways that promote cell proliferation and differentiation (42), C1QBP in CD4+CD45RO+, which promotes T cell survival and proliferation (43), and several genes involved in downstream signaling of the T cell receptor, such as DBNL in CD4+CD45RO-, NFATC2 in CD4+CD45RO+, and ITSN2 in CD8+CD45RO+ ( (44-46), see Supplementary Figures 2B-F).For a full listing of genes differentially expressed in AD patients for each of the four T cell subtypes in the comparison dataset, see Supplementary Table 2).
About 50% of PRS-associated genes were significantly associated with the PRS in four or more trait/cell type combinations (see Supplementary Figure 3 for breakdown of trait overlap across and within cell types).Interestingly, several autoimmune disease traits feature almost all significant PRSassociated genes in an inverse relationship with the PRS (as seen by cells colored blue), regardless of the cell type, suggesting a pattern of downregulated genes in individuals with high polygenic risk for these conditions.364 genes were nominally associated with the PRS for over ten traits across T cell subtypes.Among them were genes that play a role in T cell memory and activation, T cell receptor signaling, and cytokine response.These include TRAT1, SLA, IL10RA, MAF, and CXCR4.CXCR4 has potential disease relevance as a chemokine receptor that could mediate infiltration of T cells into the central nervous system (CNS) (47,48).Interestingly, SOD1, a gene with risk variants for ALS (49), was associated with the PRS for several non-ALS traits in our cohort.
We further compared PRS-associated genes across traits by calculating the Pearson's correlation coefficient of the effect size of PRS association for any two traits within each T cell subtype.As expected, genes associated with the PRSs for lymphocyte and white blood cell counts were highly correlated in all four T cell subtypes (see Figure 2, quantification in Supplementary Table 3).Within a particular trait category (immune function, autoimmune diseases, and neurological disorders), significant correlations between traitassociated gene expression patterns were generally positive.Trends in correlation coefficients also remained the same across T cell subtypes, with few exceptions.However, the gene expression signature associated with the PRS for stroke was negatively correlated with other neurological conditions such as Parkinson's disease, amyotrophic lateral sclerosis, and epilepsy.

Pathway analysis of PRSassociated genes
To interrogate the functional connections of PRS-associated genes, we used Gene Set Enrichment Analysis (22) (GSEA) to detect biological pathways in the GO Biological Processes database (23) over-represented or under-represented by genes from our dataset.Input genes for GSEA were ranked by the t value for association with the PRS.Pathways with a false discovery rate q-value under 0.05 are shown in the dot plots in Figure 3 organized by cell type and colored by trait (significant GSEA pathways can also be viewed in Supplementary Table 4).Interestingly, some of these pathways relate to functions without a strongly established biological connection to disease pathology.For example, the "presynaptic endocytosis" and "synaptic vesicle recycling" pathways, which we would expect to be potentially dysregulated in some neurological disorders, are significant among genes positively associated with the PRS for systemic lupus erythematosus.
Closer interrogation of several pathways revealed genes involved in T cell and other immune cell functions.For "substrate adhesion-dependent cell spreading", associated with the PRS for AD in CD4+CD45RO+, genes included C1QBP and ITGA4.The "hematopoietic stem cell differentiation" pathway, associated with the PRS for ALS in CD4+CD45RO-T cells, includes proinflammatory interleukin genes IL1A, IL1B, and IL6, and the Heatmap of genes nominally associated with the PRS for one or more traits.Annotation rows above the heatmap show cell type, trait type, and trait, matching the color legends at the right.Each row of the heatmap is a gene, and each cell is colored by the strength and direction of the association with the PRS (shown by the color legend at bottom right), or gray if the association is insignificant.Heatmap comparison of PRS-associated genes across traits.The heatmap summarizes results of correlation test between PRS-associated genes for any two traits (boxes above yellow diagonal) or the numbers of genes significantly associated with the PRS (p < 0.05) shared between any two traits (boxes below the yellow diagonal).Traits are listed above and to the left of the heatmap, colored according to trait categories as in Figure 2.Each box in the heatmap reflects four values, for CD4+CD45RO-(top left), CD4+CD45RO+ (top right), CD8+CD45RO-(bottom left) and CD8+CD45RO+ (bottom right).Boxes for Pearson's correlation tests are darker red for r values approaching 1, darker blue for r values approaching -1, and white if insignificant after Bonferroni multiple testing correction with n = 91.Boxes comparing overlap of significant PRS-associated genes are darker green for higher numbers of shared genes between two traits.For quantification, see Supplementary Table 3.

A B C
GSEA of PRS-associated genes by T cell subtype.Dot plots show pathways significantly over-or under-represented after multiple testing correction for (A) CD4+CD45RO-, (B) CD4+CD45RO+, and (C) CD8+ subtypes.Trait is shown by color (see legend at right), q-value after multiple testing correction is shown by position on the x-axis, and shape denotes whether the t-value sign for the pathway is negative (circle, meaning the pathway is under-represented) or positive (triangle, meaning the pathway is over-represented).Statistics of significant GSEA pathways are found in Supplementary Table 4.
CXCR4 chemokine receptor that may spur T cell migration into CNS tissue, especially during neurodegenerative disease (3).SELL, a gene that allows naïve T cells to exit the bloodstream peripheral lymph nodes (50), is part of the "response to interferon alpha" pathway associated with the PRS for C-reactive protein levels in CD4+CD45RO-(presumably naïve) T cells in our data.These findings better elucidate T cell functional changes for disorders whose relation to T cell biology is less understood, such as AD or epilepsy.Pathway analysis also sheds light on potential T cell mechanisms for traits whose connection to adaptive immunity is well known, such as lymphocyte counts or C-reactive protein levels.Finally, these data imply that the extent and nature of T cell activity in disease pathology may depend in part on individual polygenic risk for these conditions.

Comparing abundance of PRSassociated genes between CD4+ and CD8+ T cell subtypes
Some conditions feature pathological mechanisms that are unique to a particular T cell subtype or favored by one T cell subtype over another.Many autoimmune disease traits, for example, are driven more by CD4+ T cell-mediated pathology than CD8+ (51), while recent single-cell sequencing research in neurodegenerative disease patients has highlighted mechanisms of inflammation and cytotoxicity in CD8+ T cell clusters (2, 3, 52).We sought to determine whether similar patterns existed in our genotype-phenotype correlation studies, based on the abundance of PRS-associated genes in CD4+ and CD8+ T cell subtypes by trait.
Figure 4 shows that autoimmune disease traits (ulcerative colitis through type 1 diabetes) have most PRS-associated genes in CD4+ T cell subtypes, pairing well with previous observations of CD4+ T cell involvement in these conditions (51).However, when only considering genes inversely related to the PRS (denoted by a negative t-value and shown in the rightmost column), several traits have the majority of these genes in CD8+ T cell subtypes.For individuals with high polygenic risk for autoimmune disease, this suggests a shift towards a downregulated gene expression pattern in CD8+ T cells, even as CD4+ T cells ramp up expression of many genes.Lymphocyte and white blood cell count traits have a higher percentage of all PRS-associated genes in CD8+ T cell subtypes.For several traits, the relative percentage of genes in CD4+ versus CD8+ T cell subtypes differs widely by the t-value sign (for quantification, see Supplementary Table 5).
We looked further into PRS-associated genes on the extreme ends of the t value distribution for traits where gene enrichment in CD4+ vs. CD8+ T cells differed by t value sign.Genes in CD4+ T cells with a negative t value for association with the PRS included HAVCR1 in Crohn's disease.CD4+ genes inversely associated with the PRS for Parkinson's included DOCK8 and CD59.Among CD8+ genes that scaled proportionally with the PRS for Parkinson's disease, top genes included GZMM and IL2RB.Genes related to the PRS for C-reactive protein levels include IL16, IL10RA, CASP1, and PPM1A.PPM1A, while upregulated in CD4+ T cells of individuals with high PRS for C-reactive protein levels, is downregulated in CD8+ T cells for the same individuals in our data.

Discussion
Our experiments show that T cell gene expression phenotypes can change with respect to polygenic risk for disease.We calculated a PRS for AD in our cohort and estimated its predictive value for disease, then calculated PRSs for 13 other diseases and traits.Correlating PRSs with gene expression, and determining PRS-associated genes with high overlap across traits and T cell subtypes, detected genes involved in key aspects of T cell signaling and activation.We explored how PRS-associated T cell transcriptomic signatures compared between traits, and biological pathways and processes represented by PRS-associated genes.Finally, we found that several traits displayed differential polarization of their PRS-associated genes towards CD4+ or CD8+ T cell subtypes, even when considering the direction of association between the PRS and gene expression.While several groups have previously detailed changes in T cell gene expression related to individual genetic variants, our analyses here show that many disease-associated variants can have an aggregate effect on T cell transcriptomic phenotypes as well.
The predictive value of the AD PRS in our cohort for disease status provided helpful confirmatory results in light of other PRS studies.Just as we detected associations between the AD PRS and measures of AD pathology, polygenic risk for AD has been shown elsewhere to correlate with neuropathological phenotypes such as higher amyloid burden as measured with positron emission tomography (PET) (53,54), volume loss in brain regions such as the hippocampus and entorhinal cortex as Comparing enrichment of PRS-associated genes between CD4+ and CD8+ T cell subtypes.Heatmap showing the relative abundance of all PRS-associated genes (left column), genes with a positive tvalue (middle column), or genes with a negative t-value (right column) in CD4+ or CD8+ T cell subtypes.Cells are colored red when most genes are in CD4+ T cell subtypes, and blue when most genes are in CD8+.Quantities associated with this heatmap are available in Supplementary Table 5.
seen on MRI (55,56), and levels of phosphorylated tau or amyloid beta in plasma or cerebrospinal fluid (57-59).Less understood is the way immune behavior is shaped by disease-associated genetic variants in aggregate.While our dataset derives from a primarily AD participant cohort, T cell gene expression changes were observed in association with the PRS for eighteen other traits.Just as other studies have shown PRS-associated changes in cognitive performance at ages far below the typical onset of AD (53,56), our gene expression findings suggest that T cell transcriptomic identity could be altered by a genetic landscape predisposing to conditions like lupus or epilepsy, even without clinical signs.
Correlating transcriptomic phenotypes across traits in our dataset generally showed that traits within a particular category, such as autoimmune disease, had positively correlated gene expression patterns.Some exceptions exist, as for stroke, whose PRS-associated gene expression profile correlated poorly with other neurological conditions.Dissecting the relationship between PRSs across traits, or PRS-associated phenotypes, could be informative in other cohorts.We know that the overall genetic architecture of several autoimmune disease traits features a notable degree of overlap (60), as do several subtypes of dementia (61).Our study suggests that PRS-associated T cell genes may reveal similar polygenic risk-mediated phenotypes across several related disease traits.For autoimmune disease traits, this overlap could reflect common T cell autoreactivity mechanisms regardless of the antigenic target, such as upregulation of genes related to inflammation or proliferation.If this is the case, therapeutic strategies targeting PRSassociated T cell genes could show efficacy in a range of autoimmune conditions.Such an approach could also be effective for neurological diseases, where immune-targeted therapies that reduce pathology in one disease context, such as AD, could be repurposed for ALS or epilepsy.
Our pathway analysis results hinted at the importance of profiling genotype-phenotype correlations in disease-relevant tissue.Several functionally intriguing pathways included genes known to affect migration and homing in T cells.In several disease states, T cell gene expression and clonality can change dramatically upon entry of T cells into target tissue.For example, in Parkinson's and Lewy body dementia patients, T cells isolated from CSF presented a distinct transcriptomic signature from peripheral blood T cells, including upregulation of the chemokine receptor CXCR4 (3).Often, genetically-regulated gene expression changes in T cells are specific to particular T cell subtypes (6,(8)(9)(10)(11)62) or activation states in vitro (8,9,63), suggesting that genotype-phenotype correlations specific to tissue microenvironment may exist as well.Our use of peripheral T cells in an AD patient cohort, as opposed to CNS-derived T cells, represents a limitation to the generalizability of our findings in a neurodegenerative disease context.
The use of many contributing SNPs as input to the PRS often results in more generalized findings, compared to the single SNP-single gene approach of eQTL studies.This was certainly the case in our results, as only two PRS-associated genes remained significant after Bonferroni multiple testing correction at q = 0.05.The GWAS base data used for PRS calculation also generally does not account for the reality of disease subtypes or phenotypic variability between patients, which is especially prevalent in autoimmune diseases.While some diseases show a similar landscape of genetic risk across phenotypic subtypes (34), future GWAS studies should examine these subgroups more closely, to identify individual variants or trends in polygenic risk associated with specific manifestations of disease progression.Our low sample size in comparison to most genotype-phenotype correlation studies likely limits well-powered detection of PRS-associated genes after multiple testing correction.Future studies seeking to robustly detect PRS-associated genes, especially in multiple tissues or cell types, should likely have a sample size of several hundred or more.Newer options for reduced cost genotyping and RNA-sequencing should make this approach feasible.
Our use of CD45RO as a marker to differentiate naïve from memory T cells also represents a limitation to the interpretability of our findings, as T cell memory populations that re-express CD45RA (41) would be labeled as naïve by our sorting strategy.These memory T cells subtypes, known as TEMRA, have particular importance in neurodegenerative disease contexts (2).CD4+ T cell populations also have several subtypes distinguished by specific transcription factors and secreted cytokines, including Th1, Th2, Th17, regulatory T, and others.In several diseases we have included here, pathology is mediated far more by some CD4+ T cell subtypes than others, especially for autoimmune diseases.Other conditions involve a deficiency in one or more of these subtypes, such as regulatory T cells in ALS (64).It is likely that our sorting strategy misses some PRS-mediated gene expression changes that would only be seen in specific CD4+ T cell subtypes.Importantly, the critical contribution of B cells to the pathology of several autoimmune diseases is not reflected in our choice of immune cell types, and should be examined.Future studies seeking to identify PRS-associated genes in multiple cell types should be aware of sources of nuance in cell subtype markers, or use methods such as single-cell RNA-sequencing or cellular indexing of transcriptomes and epitopes (CITE-seq) to comprehensively profile markers for robust cell type identification.
In future research, studies that arise from other cohorts involving collection of genomic and gene expression data will be vital tools for comparing eQTL with polygenic risk-mediated transcriptomic phenotypes, especially those that collect gene expression data from specific tissues or cell types.These studies should be sufficiently large to generate well-powered results for PRS calculation and correlation of PRS with other phenotypes.Existing datasets from projects such as GTEx (65), which has already extensively profiled tissue-specific eQTL, could also be mined for polygenic risk-mediated gene expression changes in a variety of disease settings.The number of studies collecting genome sequencing or genotyping data alongside one or more quantitative traits will continue to expand, for AD and other disease cohorts.The utilization of PRSs in genotype-phenotype correlation studies will be invaluable for diseases such as AD, where the contribution of T cells is coming to light.

Data availability statement
All PRS-associated genes generated in this study are found in Supplementary Table 1.The original gene expression count matrix is in Supplementary Table 6 of this publication, and on Zenodo (DOI: 10.5281/zenodo.10822958).The datasets containing individual-level genomic data, including genetic variants and PRSs, are not readily available because of ethical and privacy restrictions, including restrictions due to participant identifiability.Requests to access the datasets should be directed to Badri Vardarajan, bnv2103@ cumc.columbia.edu,or on synapse.org with accession number syn17008936.Code used in this publication can be found at https://github.com/ddressman91/Frontiers_PRS_Tcells.correction.SIZE refers to the number of genes in the pathway, t.value.signrefers to the whether the pathway is over-represented (+) or under-represented (-) by positively correlated with the PRS for a given trait.

SUPPLEMENTARY TABLE 5
Quantities used to obtain the heatmap, including absolute counts of PRSassociated genes in CD4+ and CD8+ subtypes for each trait, percentage of all PRS-associated genes for a given trait, and percent differences in PRS-associated gene count between CD4+ and CD8+ subtypes for a given trait.

TABLE 1
Summary of demographics for patients with data used for PRSassociated gene calculation.