PheWAS and cross-disorder analysis reveal genetic architecture, pleiotropic loci and phenotypic correlations across 11 autoimmune disorders

Introduction Autoimmune disorders (ADs) are a group of about 80 disorders that occur when self-attacking autoantibodies are produced due to failure in the self-tolerance mechanisms. ADs are polygenic disorders and associations with genes both in the human leukocyte antigen (HLA) region and outside of it have been described. Previous studies have shown that they are highly comorbid with shared genetic risk factors, while epidemiological studies revealed associations between various lifestyle and health-related phenotypes and ADs. Methods Here, for the first time, we performed a comparative polygenic risk score (PRS) - Phenome Wide Association Study (PheWAS) for 11 different ADs (Juvenile Idiopathic Arthritis, Primary Sclerosing Cholangitis, Celiac Disease, Multiple Sclerosis, Rheumatoid Arthritis, Psoriasis, Myasthenia Gravis, Type 1 Diabetes, Systemic Lupus Erythematosus, Vitiligo Late Onset, Vitiligo Early Onset) and 3,254 phenotypes available in the UK Biobank that include a wide range of socio-demographic, lifestyle and health-related outcomes. Additionally, we investigated the genetic relationships of the studied ADs, calculating their genetic correlation and conducting cross-disorder GWAS meta-analyses for the observed AD clusters. Results In total, we identified 508 phenotypes significantly associated with at least one AD PRS. 272 phenotypes were significantly associated after excluding variants in the HLA region from the PRS estimation. Through genetic correlation and genetic factor analyses, we identified four genetic factors that run across studied ADs. Cross-trait meta-analyses within each factor revealed pleiotropic genome-wide significant loci. Discussion Overall, our study confirms the association of different factors with genetic susceptibility for ADs and reveals novel observations that need to be further explored.


Introduction
Autoimmune disorders (ADs) are a group of about 80 (1) disorders that occur when self-attacking autoantibodies are produced due to failure in the self-tolerance mechanisms (2).The estimated overall prevalence is 3% in the United States (3), and recent studies report an increasing trend (4)(5)(6).Additionally, ADs are often comorbid and cluster within families (7,8).The majority of ADs are polygenic and previous studies revealed associations with genes in the human leukocyte antigen (HLA) region (9,10).However, multiple additional associations with genes outside of the HLA are found with various ADs, and many times they are implicated in more than one disorder (10).The genetic correlation across multiple ADs has not been fully explored (11,12).So far, cross-disorder GWAS meta-analyses have only focused on a few ADs, usually three to seven at a time (12)(13)(14)(15), while others have only focused on pairwise meta-analyses (16,17).Given the wide comorbidities observed in epidemiological studies and the evidence for sharing common genetic background across multiple ADs, a systematic large-scale analysis is warranted.
In ADs, like other complex disorders, environmental factors are also involved in disease development along with genetic predisposition.Multiple studies have reported associations between viral infections and specific autoimmune diseases (18).For instance, a recent study (19) is suggesting that infection of Epstein-Barr virus could be the leading cause of Multiple Sclerosis.Additional associations between ADs and environmental factors such as smoking, and UV exposure have also been reported (20,21).Epidemiological studies have reported a high comorbidity across different ADs (8) as well as links to other traits, including psychotic disorders (22), allergies (23), and obesity (24).
Given the complex genetic background of ADs, Polygenic Risk Scores (PRS) which are an estimate of an individual's genetic predisposition for a trait, are an important tool to help understand disease correlations.They are usually calculated as the total of the risk alleles an individual carries, weighted by their effect sizes as measured in a previous genome-wide association study (GWAS) (25).This genetic risk can then become the basis of a Phenome-wide association study (PheWAS), with a goal to explore whether risk variants identified by a GWAS or disease PRS are associated with a wide variety of phenotypes (26).Biobanks that combine genetic data with Electronic Health Records (EHR) are essential for the PheWAS approach, as they are the source of the phenotypes used in the analysis (27).Since the PheWAS is a hypothesis-free analysis, it can be used to generate new hypotheses about novel associations that might have not been uncovered through hypothesis-driven approaches.
Here, for the first time, using summary statistics data of 11 different ADs and genetic and phenotypic data from the UK Biobank, we performed a PRS-PheWAS, interrogating associations of AD PRS with a wide range of socio-demographic, lifestyle, and health related phenotypes.Additionally, we investigated the genetic relationships across the studied ADs and conducted cross-trait GWAS meta-analyses for the identified AD sub-groups.Our findings present an overview of the phenotypic and genetic architecture and relationships of ADs.

Study population
The UK Biobank is a large-scale, population-based, prospective cohort that recruited between 2006 and 2010 over 500,000 participants from the UK aged 40-69 years old.The participants provided blood, urine, and saliva samples for biochemical tests and genotyping, as well as self-reported information which was then linked to their healthrelated records.The phenotypic and genetic data we used in this study were obtained from UK Biobank under application number #61553.
The initial UK Biobank dataset included 488,377 individuals genotyped on the Affymetrix UK BiLEVE Axiom array or the Affymetrix UK Biobank Axiom array.We performed standard quality control on individuals and genetic markers (info>=0.9,maf>=0.01,geno<=0.02,hwe >= 10 -6 ) with PLINK 1.9 (28).Initially, participants with withdrawn consent, sex mismatch, sex aneuploidy, self-reported non-white British ancestry, and with kinship coefficient <0.0625 (third-degree relatedness (29)) were excluded.Additional Principal Component Analysis (PCA) with 1000 Genomes data as reference was performed using TeraPCA (30) to exclude individuals with non-European ancestry.The final dataset included 330,841 individuals and 7,634,371 SNPs.53.98% of the selected participants are females, the average age is 56.8 (sd=8) years.Table S1 provides a breakdown of the participants' age and the percentage of the selected autoimmune diagnoses present in the UK Biobank.

Polygenic Risk Scores
Publicly available and in-house GWAS summary statistics for 11 ADs performed on datasets of European ancestry and no UK Biobank participants were collected.For the PRS calculations we used PRSice2 (31).The independent SNPs with p-values<10 -5 , after clumping using a window of 500kb and an r 2 threshold of 0.1, were included in the PRS calculations and the score was yielded as the weighted, standardized sum of the effect (score-std option).Additionally, we normalized the PRS for age, sex, genotyping batch, and the first ten PCs.We repeated the PRS calculations excluding the extended HLA region (hg19, chr6 25-33 Mb).Table 1 shows the studied autoimmune datasets and the number of SNPs included in the PRS calculations.PRS performance was evaluated using Nagelkerke's pseudo-R2 metric for each AD.We used the AD summary statistics as the base and UK Biobank participants as the target data.We defined the individuals with the ICD10 code diagnosis for the studied AD as cases, and the UK Biobank individuals with no reported ICD10 diagnoses were defined as controls.Age, sex, genotyping batch and first ten PCs were included as covariates.

Phenotypes
We included 3,254 phenotypes from UK Biobank that were assigned to seven broad categories: Biomarkers, Cognition and Mental Health, Disease Diagnoses, Health and Medical History, Physical Measures, Lifestyle, and Sociodemographics.Specifically for the Disease Diagnoses category, we included only the ICD10 codes and used the R PheWAS tool (32) to map similar diagnoses into one phecode.The breakdown of data fields in each category is shown in the Supplementary Materials (Figures S1, S2).

PheWAS
For the PheWAS analyses, we used the tool PHESANT (33) to test the association of each disease PRS with each UK Biobank outcome.PHESANT, which is described in detail in (33), is commonly used in PheWAS analyses and automatically removes the instances with missing values from the UK Biobank Data-Codings.Age, sex, the first 10 principal components to correct for population stratification, and the genotyping batch were included as covariates in all regression models.To account for multiple testing, we used the R function p.adjust to calculate the FDR adjusted pvalues and set the significance threshold at pFDR<0.05.

Cross-Disorder GWAS Meta-analysis
Pairwise genetic correlation analyses were performed for all 11 ADs after removing the extended HLA region (hg19, chr6 25-33 Mb) using LDSC (34).Only SNPs present in the HapMap3 reference panel were included in analyses and we used precalculated LD scores from 1000 Genomes European data.Datasets with less than 200,000 SNPs overlap with the LDSC reference data or heritability z-score <1.5 [as defined in (35)], were excluded from downstream analyses, namely CEL, PSO, and JIA datasets were removed.
To further explore the architecture and correlations of the studied disorders, we performed exploratory factor analysis (EFA) on the genetic correlation matrix using the R tool GenomicSEM   (36).We further used a confirmatory factor analysis (CFA) to validate our model.For groups of disorders within each of the factors, we performed a cross-disorder GWAS meta-analysis with ReACt (37) and corrected for sample overlap between the datasets.In order to identify potentially pleiotropic SNPs, in each metaanalysis we estimated the posterior probability (m-value) using METASOFT (38) to identify SNPs with high m-values (m-value>0.9) for all studies in the meta-analysis.Then, using the pleiotropic SNPs, we identified the LD independent regions (r 2 <0.1) from the index SNPs with p<5×10 -8 .We used the default LD clumping window (250kb) and mapped into the regions the genes located no more than 20kb away.As reference for the LD estimation, we used the European samples from 1000 Genomes.Additionally, we merged into one overlapping genomic regions using bedtools (39).All genes that mapped to the identified LD independent regions for each meta-analysis after clumping, were submitted to g:Profiler (40) to perform functional enrichment analysis for Gene Ontology terms (GO : BP, GO : CC, GO : MF, released 2021-12-15), Reactome (REAC, released 2022-1-3) and Kyoto Encyclopedia of Genes and Genomes (KEGG FTP, released 2021-12-27).For all experiments we performed the recommended multiple hypothesis correction (g:SCS) method with the significance threshold of p = 0.05.We repeated the analysis after excluding the electronic GO annotations (Inferred from Electronic Annotation [IEA]) to have higher confidence in the enrichment analysis.

Individual disorder PheWAS
First, we investigated the potential association of AD genetic risk to other phenotypes, including socioeconomic factors, lifestyle, biomarkers, disease diagnoses, health history and mental health, performing PRS-PheWAS.We used the LD-independent SNPs with p<10 -5 to calculate PRS for each of the studied ADs in each individual, and tested the association of the normalized -for age, sex, the first 10 principal components, and genotyping batchautoimmune PRS, with 3,254 phenotypes in 330,841 UK Biobank samples (Table S1).The PheWAS analysis was adjusted for age, sex, the first 10 principal components, and genotyping batch.We found a large number of associations with each disorder which differed depending on whether the HLA region was included in the analysis (Figure 1, Table 2, Figures S3, S4 and Tables S2, S3).For SLE PRS with HLA region included in the analysis, we found the highest number of associations to different phenotypes (n=263).On the other hand, analysis for SLE PRS without the HLA region included, was associated with only 38 phenotypes.Interestingly, for CEL, T1D and RA, more PRS associations to phenotypes were actually found when the HLA region was excluded from the calculations.For Psoriasis, genetic risk was found associated with other phenotypes only when HLA was included in the genetic risk calculations (significant association with 79 phenotypes).
In the following, we describe in detail patterns that emerge across all studied disorders and highlight significant results for phenotype associations to genetic risk with at least three ADs.

Disease diagnoses
For six of the studied ADs (CEL, RA, MS, SLE, T1D, VITE), we observed a significant positive association of PRS to the same disease diagnosis (Table S3).These results indicate a good predictive power of the respective PRS.We should note that for PSC, the disease diagnosis phenotype was not available in the dataset.Additionally, we estimated the Nagelkerke's pseudo-R2 for all ADs (Table S4).
Celiac disease was found significantly associated with genetic risk for all 11 ADs that we studied.We observed that higher PRS for RA, VITE, VITL, JIA and PSO is associated with lower risk for the "Celiac disease" diagnosis phenotype.On the other hand, higher PRS for MS, MG, PSC, SLE, T1D and CEL was associated with Number of significant phenotypes associated with autoimmune polygenic risk scores (p<10 -5 ).The different colors represent the general UK Biobank categories.The "HLA excluded" bar shows the number of significant associations with the phenotypes when HLA was excluded from the AD PRS calculations.The "HLA included" bar shows the number of significant associations with the phenotypes when HLA was included in the AD PRS calculations.The "Shared" bar shows the number of significant associations with the phenotypes for both HLA included or excluded AD PRSs.
higher risk with the "Celiac disease" diagnosis in the UK Biobank.The association with CEL, T1D, PSC, and RA remained significant even after excluding the HLA, although with an opposite effect direction for the last one (Figure 2; Table S2).
"Ulcerative colitis" diagnosis was the second digestive phenotype most commonly found associated with the autoimmune PRS, and high MS, RA (no-HLA) , JIA (HLA) , PSC (no-HLA), and CEL (no-HLA) PRS were associated with higher risk for the diagnosis (Figure 2, Table S2).
In the endocrine diagnoses, most autoimmune PRS were associated with "Hypothyroidism" followed by "Type 1 diabetes".RA, VITE, PSC, SLE, T1D and MG were associated with higher risk of "Hypothyroidism", even after HLA was excluded.JIA, VITL, MS and CEL association with Hypothyroidism was significant only after HLA was excluded (Figure 2; Table S2).
In dermatologic diagnoses, the autoimmune "Sicca syndrome" was the most associated phenotype with the autoimmune PRS.We observed a positive association of the "Sicca syndrome" diagnosis with SLE, CEL, RA (no-HLA) , MS (HLA) , PSC (HLA) , and MG (HLA) PRS.In contrast, there was a negative association with PRS VITE, PSO, and VITL (Figure 2; Table S2).
In the neoplasms category, high PRS for VITL and VITE was associated with lower risk for skin cancer outcomes, including Non-Hodgkins lymphoma, other non-epithelial cancer of skin and melanomas of skin (Figure 2; Table S2).

Cognition and mental health
For PSC and SLE PRS, we found the largest overlap (n=20) of traits associated in the same direction.These associations included lower risk for phenotypes such as addictions, depression, and "low/worse" mental health, while they were positively associated with phenotypes describing higher cognitive function (Figure 3; Table S2).For PRS of MS, and MG, we also found an association with lower risk for phenotypes describing poor mental health (Figure 3; Table S2).On the contrary, higher PRS for VITL was associated with phenotypes describing poor mental health and depression (n=10), and had a negative association with phenotypes describing cognitive function (n=4) (Figure 3; Table S2).PSO and VITE associated with higher risk with phenotypes describing poor mental health and anxiety (Figure 3; Table S2).The table shows the strongest associated phenotypes with each AD PRS with and without HLA, the beta, the 95% CI and the FDR adjusted p-value.

Lifestyle
In this category the trait "Never eat eggs, dairy, wheat, sugar: Wheat products" was associated with PRS for ten ADs when HLA was included in analysis, suggesting susceptibility to food allergies; VITL, VITE, PSO, JIA, RA are negatively associated with the phenotype, while PSC, SLE, MG, CEL (irrespectively of HLA) and T1D were positively associated with the phenotype (Figure 4; Table S2).
Again, same as for the previous category of traits, PSC and SLE PRS had the largest overlap of associated phenotypes (n=20) in the same effect direction.They were negatively associated with phenotypes related to dietary habits (higher intake of dried fruit, salad/raw vegetable, non-oily fish), cannabis usage, exercise, smoking status (Figure 4; Table S2).
Additionally, VITL and VITE PRS (irrespectively of HLA) were positively associated with darker skin color, and negatively Significant PRS-PheWAS for at least three AD PRS with phenotypes in the Disease Diagnoses UK Biobank category, using the normalized PRS.The shown phenotypes were significantly associated, after FDR adjustment, with at least three AD PRS irrespectively of the HLA status.The colors of cells indicate the standardized effect sizes (b) for the regression between AD PRS with HLA and each phenotype.The one star "☆" shows the significant results only with the "HLA included" AD PRS.The two stars "☆☆" show the significant associations with both "HLA included or excluded" AD PRS with the same effect direction.The star and the upper facing triangle "☆▵" show the significant associations with both "HLA included or excluded" AD PRS but with opposite effect directions.The upper facing triangle "▵" shows the significant associations only with "HLA excluded" AD PRS that the effect direction is the same as the color indicates.The down-facing triangle "▿" shows the significant associations only with "HLA excluded" AD PRS that the effect direction is the opposite of what the color indicates.To group the disease diagnoses phenotypes, we used the R PheWAS tool and collapsed similar ICD-10 codes into one phecode.We used the hclust R function to perform the hierarchical clustering of the autoimmune disorders shown in the dendrogram using all standardized effect sizes for the disease diagnoses phenotypes.associated with higher risk of "ease of skin tanning", "childhood sunburn occasions", "use of sun/uv protection" and "facial aging" (Figure 4; Table S2).We observed the opposite associations between PSC and SLE PRS and these sun exposure phenotypes (Figure 4; Table S2).

Health and medical history
In this category the self-reported phenotype "Diagnosed with coeliac disease or gluten sensitivity" was significantly associated with 11 autoimmune PRS (Figure S5; Table S2).We observed a positive association with PSC, SLE, T1D, CEL, MG, and a negative association with RA, JIA, VITL, VITE, PSO, these results are consistent with similar phenotypes, such as the "Celiac disease" diagnosis and the "Never eat eggs, dairy, wheat, sugar: Wheat products" phenotype in the Lifestyle category.
Additionally, high PRS T1D, VITL, and VITE were associated with lower risk for "Basal cell carcinoma" phenotype under the Cancer register sub-category.Specifically, for VITE and VITL we observed a negative association with self-reported basal cell carcinoma (Figure S5; Table S2).
We also observed significant associations of autoimmune PRS with phenotypes in the Sociodemographics (Figure S6), Biomarkers (Figures S7-S9) and Physical Measures (Figure S10) categories, without any patterns emerging across disorders.Results are shown in the supplement.

AD Genetic Architecture and Cross-Disorder GWAS Meta-analysis
Driven by the known comorbidity across AD [based on epidemiological studies (8)] and the overlap in phenotypic associations with autoimmune PRS that we described above, we proceeded to perform cross-disorder genetic correlation and GWAS summary statistics meta-analyses to explore the genetic relationship and genetic architecture of ADs and identify potentially pleiotropic loci.Such pleiotropic loci would drive pathophysiology across multiple ADs.
Initially, we performed analysis for all 11 ADs (Figure S11), however, given the limited SNP overlap of our datasets for CEL, Significant PRS-PheWAS for at least three AD PRS with phenotypes in the Cognition and Mental Health UK Biobank category, using the normalized PRS.The shown phenotypes were significantly associated, after FDR adjustment, with at least three AD PRS irrespectively of the HLA status.The colors of cells indicate the standardized effect sizes (b) for the regression between AD PRS with HLA and each phenotype.The one star "☆" shows the significant results only with the "HLA included" AD PRS.The upper facing triangle "▵" shows the significant associations only with "HLA excluded" AD PRS that the effect direction is the same as the color indicates.To group the phenotypes, we used the categories provided by the UK Biobank.We used the hclust R function to perform the hierarchical clustering of the autoimmune disorders showing in the dendrogram using all standardized effect sizes for the Cognition and Mental Health phenotypes.
PSO and JIA we excluded them from further analyses.After correction for multiple testing, we observed significant positive correlations of RA with T1D (rg no-HLA :0.52), SLE (rg no-HLA :0.51), and MG (rg no-HLA :0.47).VITL and VITE were also significantly correlated (rg no-HLA :0.64).Additional autoimmune correlations with p<0.05 are shown in Figures 5A, B, including the pairwise correlations after excluding HLA.
Since the pairwise genetic correlation analysis showed a complicated correlation pattern among the studied disorders, we performed exploratory factor analysis (EFA) followed by a confirmatory factor analysis (CFA) to dissect the AD relationships.We used the four-factor model identified in EFA and included the disorders with loadings greater than |0.3| in each factor.The CFA analysis in GenomicSEM showed a good fit of the model to the data (c 2 (12) =16.1;AIC =64.1;CFI = 0.98; SRMR = 0.07).The first factor included VITE, VITL and MG.MG, RA and SLE were included in the second factor, while the third factor consisted of T1D, PSC and MG (with a negative loading).Lastly, factor four consisted of PSC and MS (Figure 5C).
In the cross-disorder meta-analysis on the first factor, that includes VITL, VITE and MG, we identified nine significant pleiotropic (m-value>0.9 in all studies) LD independent regions (Table 3); two of them mapping on the Cytotoxic T-Lymphocyte Associated Protein 4 (CTLA4) -Inducible T Cell Costimulator (ICOS) and Fli-1 Proto-Oncogene, ETS Transcription Factor (FLI1) genes, were not significant in the individual GWAS studies included here.However in the GWAS catalog (41) FLI1 has a significant association with Vitiligo, when the onset age is not taken into account (42), and CTLA4 is found associated with different GWASs (not studied here) for both Myasthenia gravis (43) and Vitiligo (42).CTLA4 was also significant in the gene-based analysis (44) of the data we used in this meta-analysis.The gene set enrichment analysis including the genes in the significant and pleiotropic regions identified four significantly enriched GO : BP terms; bone cell development, immune system development, myeloid cell development, and immune system process (Figure 6A; Table S5).
When we performed the meta-analysis of MG, RA and SLE, we identified 17 genome-wide significant pleiotropic loci.Three of these loci mapping to Protein Tyrosine Phosphatase Receptor Type C (PTPRC), Interleukin 12 Receptor Subunit Beta 2 (IL12RB2) and LINC00824 were not genome-wide significant in the GWAS studies we analyzed (focusing on European ancestry), however, they were reported as significant associations in GWAS of higher power that Significant PRS-PheWAS for at least three AD PRS with phenotypes in the Lifestyle UK Biobank category, using the normalized PRS.The shown phenotypes were significantly associated, after FDR adjustment, with at least three AD PRS irrespectively of the HLA status.The colors of cells indicate the standardized effect sizes (b) for the regression between AD PRS with HLA and each phenotype.The one star "☆" shows the significant results only with the "HLA included" AD PRS.The two stars "☆☆" show the significant associations with both "HLA included or excluded" AD PRS with the same effect direction.The star and the upper facing triangle "☆▵" show the significant associations with both "HLA included or excluded" AD PRS but with opposite effect directions.The upper facing triangle "▵" shows the significant associations only with "HLA excluded" AD PRS that the effect direction is the same as the color indicates.To group the phenotypes, we used the categories provided by the UK Biobank.We used the hclust R function to perform the hierarchical clustering of the autoimmune disorders shown in the dendrogram using all standardized effect sizes for the Lifestyle category phenotypes.
were multi-ethnic (Table S6).The gene set enrichment analysis of genome-wide significant and pleiotropic regions identified 22 significantly enriched terms.Among them, six (DN2 thymocyte differentiation, regulation of MAP kinase activity, stress granule assembly, B cell proliferation, side of membrane, GRB7 events in ERBB2 signaling) were significant even after excluding the IEA GO terms (see Methods) (Figure 6B; Table S7).
In the meta-analysis of MG, T1D and PSC, we observed seven pleiotropic and genome-wide significant loci.One of them, found on chr4:10,709,726-10,726,520 (closest gene Cytokine Dependent Hematopoietic Cell Linker (CLNK), 23Kb downstream) has not been previously found to be associated with either of the three disorders (Table S8).The gene-set enrichment analysis identified six significantly enriched terms after multiple testing correction, with "RUNX1 and FOXP3 control the development of regulatory T lymphocytes (Tregs)" and "T cell receptor signaling pathway" as the two top terms (Table S9).These two were the only significant terms when we repeated the analysis after excluding the IEA GO terms (see Methods) (Figure 6C; Table S9).
Finally, for the cross-disorder meta-analysis of PSC and MS, we identified two genome-wide significant and pleiotropic loci, mapping to the previously associated Interleukin 2 Receptor Subunit Alpha (IL2RA) and BTB Domain And CNC Homolog 2 (BACH2) genes (Table S10).The gene-set enrichment analysis Frontiers in Immunology frontiersin.orgidentified three significant terms, primary adaptive immune response, primary adaptive immune response involving T cells and B cells, and interleukin-2 receptor complex.All of them remained significant even after we excluded the IEA GO terms (Table S11).

PheWAS findings shared across ADs in the same genetic factor
Finally, we explored whether the ADs belonging to each of the four factors that were identified by EFA, also share associations with phenotypes detected in the PheWAS.For MG, VITE and VITL which make up the first identified factor, we detected 29 shared phenotypes across all three ADs (Figure S12).However, there was no phenotype with the same effect direction for all three ADs.For MG, RA and SLE, which make up the second factor, we detected 26 shared phenotypes, across five categories (Figure S13).The Hypothyroidism disease diagnosis and the health related phenotype of "other serious medical conditions/disability diagnosed by doctor", were the two phenotypes, associated with all three ADs with the same effect direction, for the same HLA status.For the ADs of the third factor (PSC, T1D and MG), we identified 16 shared phenotypes in four categories (Figure S14).Nine of them had the same effect direction for all three AD for the same HLA status.Lastly, for factor four (PSC and MS), we identified 76 shared associations, 43 of them had the same effect direction for both ADs (Figure S15).

Discussion
We report results on the first PRS-PheWAS analysis exploring the association of genetic risk for 11 autoimmune disorders and 3,254 phenotypes on 330,841 individuals of European ancestry from the UK Biobank.Additionally, we explored the genetic relationship between the studied ADs seeking to dissect the genetic architecture of these highly correlated and often comorbid phenotypes.
We were able to recover previously identified associations between ADs based on epidemiological or genetic studies.For instance, a study in a Taiwanese population showed higher risk of first-degree relatives with Sicca to develop other autoimmune disorders including SLE, MS, MG, and RA (45); and we also observed here a positive association between the PRS of these four ADs and Sicca syndrome outcome.Other autoimmunerelated diagnosis outcomes associated with higher risk for the studied ADs, included hypothyroidism and Graves disease.A link between those disorders and RA, Vitiligo, SLE, T1D, CEL, and MG, is also supported by the literature (46)(47)(48)(49)(50). Additionally, as reported in previous studies (51-53), we also observed a negative association between Vitiligo risk and skin cancer.
Interestingly, we observed many associations with environmental and lifestyle factors.Diet and specifically the consumption of non-wheat products was the outcome that we found to be associated with the risk for most of the studied ADs pointing to gluten intolerance and food allergies.We observed a significant positive association between PSC, SLE, CEL, MG, T1D, and RA (when HLA was excluded from PRS calculations) and not The column SNP contains the top SNP in each locus.The columns P, OR and SE correspond to the top SNP in each locus.The autoimmune disorder specific Top SNP column contains the top genome-wide significant SNP in the locus that was available in the input dataset.The autoimmune disorder specific P column contains the lowest p-value in the locus that was available in the input dataset.consuming wheat, while the association was negative for VITE, VITL, PSO, JIA and RA.We also observed the same pattern of AD associations with the Celiac disease diagnosis phenotype, which could be indicative of the connection between gluten intolerance and Celiac disease.There is prior evidence suggesting that a glutenfree diet could be beneficial not only for patients with Celiac disease but also for T1D, RA, MS, autoimmune hepatitis, and PSO (54).
Smoking was another factor that we found significantly associated with SLE and PSC genetic risk.Indeed, it has been previously suggested that smoking is associated with higher risk for double-stranded DNA seropositivity, a marker used for SLE diagnosis, in SLE patients (55), while for PSC, there is some evidence to suggest that smoking is associated with lower risk for developing the disease (56, 57), although not always consistently supported (58).Interestingly, a previous study found that severe sunburn incidents and higher tanning ability in women are associated with higher risk of developing Vitiligo (59), however, we actually observed the opposite association for Vitiligo genetic risk, perhaps indicating different behavior towards sun exposure based on genetic risk.We also observed a negative association between CEL genetic risk and the weight of the first child.Previous studies have shown that women with undiagnosed or untreated celiac disease have higher risk to deliver a baby with reduced birthweight (60).
The link between autoimmune disorders and mental health has been previously described.For instance, exposure to stress-related disorders was found to be associated with higher risk for ADs (61), and both positive and negative associations of ADs with psychotic disorders have been summarized elsewhere (22).In our study, we observed that risk for VITL and PSO was positively associated with self-reported outcomes describing poor mental health, which is in line with previous works (22,(61)(62)(63).We also observed that risk for SLE and PSC was associated with better mental health outcomes.Epidemiological studies have reported higher psychological distress in patients with SLE and PSC (61, 64-67).However, in a study exploring the genetic correlation between immune and psychiatric related phenotypes using GWAS summary statistics, SLE was found to be significantly positively correlated only with Schizophrenia and no other psychiatric phenotypes (68).Additionally, a study using Mendelian Randomization between SLE and depression showed SLE genetic variants mildly reduce the odds of depression, suggesting that the observed association between SLE and depression might not be attributed to genetic factors (69).Thus, further analyses could be useful to explore the gap between the Network plots of the enrichment analysis for the cross-disorder meta-analyses.(A) Results of the significantly enriched terms from the genes identified in the VITE-VITL-MG meta-analysis.Results are also shown in Supplementary Table 5. (B) Results of the significantly enriched terms (after excluding the IEA terms) from the genes identified in the SLE-MG-RA meta-analysis.The full results are also shown in the Supplementary Table 7. (C) Results of the significantly enriched terms from the genes identified in the T1D-MG-PSC meta-analysis.The full results are also shown in the Supplementary Table 9. Enriched gene sets that remained significant after excluding the IEA GO terms are shown in dark green.
associations between SLE and mental health phenotypes observed in epidemiological studies, but not using genetic data.
Exploring closer the phenotypes that are associated with ADs belonging to the same genetic factor (as identified by EFA), we found that for factor three (PSC, T1D and MG) there is a clear pattern of associations with same direction of effect with phenotypes belonging in several disease diagnoses, health and medical history, and lifestyle categories as well as biomarkers.For two different factors, two and three, we observed the same direction of association with the Hypothyroidism diagnosis phenotype.Specifically for the ADs of factor two we detected a positive association with additional AD diagnoses, which we discussed in more details earlier here.Interestingly, for disorders in the first factor (VITE, VITL and MG), we did not observe any phenotype associated with the same effect direction with all three ADs.This suggests that although Vitiligo and MG are genetically correlated, their PRSs are associated with the opposite direction with the studied phenotypes.The only exception to this was the association that we observed with the Hypothyroidism disease diagnosis where VITE and MG PRS show association in the same direction.On the other hand, for PSC and MS (factor four), we observed positive associations with other ADs such as Sicca syndrome, Celiac disease, but not with Hypothyroidsm.Overall, the shared phenotypes in each factor reveal patterns whose link to ADs warrants further exploration.
It is well demonstrated that ADs are often comorbid and share both HLA and non-HLA genetic loci (8)(9)(10)15).In a recent study (11), where the genetic correlation between 13 (7 of them are also studied here) autoimmune and inflammatory disorders was also explored, the authors observed correlations across ADs and similar patterns to what we also found.Furthermore, we also provide here a more detailed analysis to understand the genetic architecture of ADs including EFA to reveal subgroups of disorders and crossdisorder GWAS to reveal pleiotropic loci that could underlie multiple disorders and drive comorbidities.Indeed, in line with the existing notion of shared genetic background across ADs, we detected numerous genome-wide significant and pleiotropic loci in each meta-analysis.All except one had already been previously associated with at least one of the ADs included in the metaanalysis, or were associated with the traits in studies of different ancestries or larger sample GWAS which we could not analyze here because summary statistics data were not available.Importantly, we identify one novel genome-wide significant and pleiotropic locus in the meta-analysis of T1D-MG-PSC.This is a previously unknown locus that could play a role in the etiology of all three disorders and is found 23Kb downstream of CLNK gene that encodes Clnk, an adapter of the SLP76 family, is involved in the regulation of immunoreceptor signaling (70).
This study comes with both strengths and limitations.The PheWAS analysis allowed us to detect significant associations between AD risk and multiple phenotypes, even after excluding the HLA region.Additionally, we were able to detect pleiotropic loci in the autoimmune subgroups that are involved in immune-related processes as the gene-set enrichment analysis revealed.However, there are limitations in this study that should be considered when interpreting the results.For the PRS calculations, although we used the largest AD summary statistics data available, there were differences in power regarding their sample size and number of SNPs.Also, as the number of UK Biobank participants with AD diagnoses is limited, we were not able to calculate the optimal pvalue threshold for SNPs to be included in PRS calculations, but rather set as threshold the p-value 10 -5 .
In conclusion, in this study we observed ADs PRS to be associated with multiple health-related and environmental factors, even after excluding the HLA region, and explored the genetic relationships of the selected ADs by estimating their genetic correlation and identifying pleiotropic genetic regions that underlie genetic risk across multiple ADs.Overall, our analyses indicate potential factors associated with genetic risk for ADs, some of which have been reported previously, and novel observations that need further exploration.These results suggest that the assessment of additional exposures related to lifestyle, mental and physical health risks by clinicians, could be beneficial for individuals with higher risk for autoimmune disorders.
reviewers.Any product that may be evaluated in this or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

5
FIGURE 5 Genetic correlation and factor analysis for 8 autoimmune disorders.The figure shows the analyses of the 8 autoimmune disorders with enough overlap (>200.000SNPs) with HapMap3 data provided by LDSC after excluding the HLA locus (hg19, chr6 25-33 Mb).(A) Heatmap of the pairwise LDSC genome wide genetic correlations of the 8 autoimmune disorders after excluding the SNPs in the HLA region.The red color reflects more positive correlation coefficients while blue reflects more negative coefficients, and the numbers within each cell are the correlation coefficients.The correlations with p<0.05 are denoted with one asterisk (*), while two asterisks (**) show the correlations that are significant after Bonferroni correction.(B) Network representation of the genetic correlation between the autoimmune disorders with p<0.05.The numbers show the correlation coefficient and the stronger the line color shows a higher coefficient.(C) Path graph of the confirmatory factor model estimated using the Genomic SEM.Four factors were identified.The factor loadings for each trait are depicted by arrows between the trait and the factor, with the standardized loading value and the standard error in the parentheses.Correlation between factors is indicated by arrows between them.Residual variance for each trait is indicated by the two-headed arrow connecting the variable to itself.

TABLE 1 Autoimmune Disease datasets used in this study.
We included the summary statistics only from the European ancestry individuals in this study.**For the PRS calculations we used the summary statistics after excluding the UK Biobank samples, while for the rest of the analyses we included the full dataset described in the study.The number of SNPs in the PRS calculations corresponds to the independent SNPs with p<10 -5 . *

TABLE 2
The most significant associations (p FDR <0.05) of each AD PRS and the UK Biobank phenotypes.