Gene-Based Tests of a Genome-Wide Association Study Dataset Highlight Novel Multiple Sclerosis Risk Genes

Multiple sclerosis (MS) is an autoimmune disorder influenced by genetic and environmental factors. Many studies have provided insights into genetic factors’ contribution to MS via large-scale genome-wide association study (GWAS) datasets. However, genetic variants identified to date do not adequately explain genetic risks for MS. This study hypothesized that novel MS risk genes could be identified by analyzing the MS-GWAS dataset using gene-based tests. We analyzed a GWAS dataset consisting of 9,772 MS cases and 17,376 healthy controls of European descent. We performed gene-based tests of 464,357 autosomal single nucleotide polymorphisms (SNPs) using two methods (PLINK and VEGAS2) and identified 28 shared genes satisfied p-value < 4.56 × 10–6. In further gene expression analysis, ten of the 28 genes were significantly differentially expressed in the MS case-control gene expression omnibus (GEO) database. GALC and HLA-DOB showed the most prominent differences in gene expression (two- and three-fold, respectively) between MS patients and healthy controls. In conclusion, our results reveal more information about MS hereditary characteristics and provide a basis for further studies.


INTRODUCTION
Multiple sclerosis (MS) is a neurodegenerative, inflammatory, demyelinating, and autoimmune disease of the central nervous system (CNS) that causes widespread tissue lesions and dysfunction (McFarland and Martin, 2007). As a result, the communication between neurons is disrupted, as demonstrated by extensive signs and symptoms (Filippi and Rocca, 2005). The etiology of MS involves genetic as well as environmental factors (Canto and Oksenberg, 2018). However, genetic factors have more vital roles in MS pathogenesis than environmental ones (Dendrou et al., 2015). Genome-wide association studies (GWAS) have recently revealed genetic risk factors for MS (International Multiple Sclerosis Genetics Consortium [IMSGC], 2019) and confirmed 233 loci, including 201 non-MHC and 32 MHC variants.
Although GWAS have helped comprehend the genetic basis of human diseases, there are still some limitations, including the unprecedented potential to produce false-positive results, lack of information on gene function, insufficient sample size, lack of well-defined case and control groups, and insensitivity to rare, and structural variants (Pearson and Manolio, 2008). Gene-based tests have been considered as a complement to GWAS (Wang et al., 2011). First, they can be more potent than individual SNP-based GWAS when multiple causal interactions affect the phenotype of interest (Ma et al., 2013). Second, genebased tests combine all SNPs into a single test and correct linkage disequilibrium (LD) (Ma et al., 2013). Also, genebased approaches reduce false positives that arise from multiple testing in GWAS (Liu et al., 2010). Finally, gene-based analysis prioritizes GWAS results and determines the association between diseases and functional genetic analysis (e.g., a shared biological pathway) (Wang et al., 2011).
Gene-based tests of large-scale MS-GWAS datasets provide strong support for the identification of novel MS susceptibility loci. This study performed gene-based tests of an MS-GWAS dataset that comprised 9,772 MS cases along with 17,376 control subjects using two methods. We also analyzed two MS case-control gene expression datasets to investigate further the differential expression of MS risk genes identified by both methods.

MS-GWAS Dataset
We used a large-scale MS-GWAS dataset from the International Multiple Sclerosis Genetics Consortium (IMSGC), which consisted of 9,772 MS cases along with 17,376 control cases of European descent collected by 23 research teams from 15 countries (Sawcer et al., 2011). After subjecting the data to the novel SNP-based quality control procedure, 464,357 autosomal SNPs were available for further analysis (Sawcer et al., 2011).

Gene-Based Testing of MS-GWAS Dataset Using PLINK
Based on the association analysis of the MS-GWAS dataset, a gene-based test was carried out with the PLINK software (SET SCREEN TEST) 1 (Purcell et al., 2007). The combined chi-square statistic for each SNP of a given gene was calculated using the following formula: N represents the number of markers (tests), and p i (i = 1 . . . N) are the corresponding p-values under the null hypothesis. x 0 2 follows a chi-squared distribution with 2N degrees of freedom 1 http://pngu.mgh.harvard.edu/purcell/plink/ (df). If the tests (for each SNP) are not independent, the statistic x 0 2 has a mean m = 2N and variance (σ 2 ), where p i and p j (i, j = 1, . . ., N) are the p-values for each test, and the covariance (cov) is calculated as: The non-negative correlation coefficients p ij between the two variables were approximated by the correlation between two SNPs i and j. The overall significance of multiple nonindependent tests is determined using the formula: where x 2 follows the central chi-squared distribution with 8N 2 /σ 2 df. This method was applied to analyze the MS-GWAS dataset using the LD information from the HapMap CEU population. The set screen test integrated all SNPs' p-values using an approximate Fisher's test, an asymptotically optimal method to obtain the overall significance. Under the same null hypothesis, a set of p-values determined by independent tests were collected and calculated (i.e., each SNP that was not related to the disease).

Gene-Based Testing of MS-GWAS Dataset Using VEGAS2
We also used the VEGAS2 software to perform a gene-based test (Liu et al., 2010;Mishra and Macgregor, 2015). In VEGAS2, there are five options regarding gene boundaries for the SNP option: SNPs within 0kbloc, 10kbloc, 20kbloc, 50kbloc, or 0kbldbin (Liu et al., 2010). The n SNPs' p-values are first converted to upper tail χ 2 statistics with one df for each gene definition. If SNPs are in the linkage equilibrium, VEGAS2 calculates a gene-based test statistic with a χ 2 distribution with n df under the null hypothesis. This software can also produce more relevant SNPs where the top SNP is in high LD (Mishra and Macgregor, 2015). We selected SNPs within a gene and any SNPs outside of the gene with r 2 > 0.8 for SNPs within the gene (0kbldbin) in order to decrease the specificity of the result for a given gene (Mishra and Macgregor, 2015).

MS Case-Control Expression Analyses
We further analyzed the differential expression of shared MS risk genes from the NCBI gene expression omnibus (GEO) database 2 . Our inclusion criteria were human tissue samples, expression profiling by the array, case-control study and sample size ≥ 20. Two GEO datasets were selected for our study, GSE21942 and GSE43591 (Kemppinen et al., 2011;Jernås et al., 2013). In GEO series GSE21942, researchers recruited 12 MS female patients and 15 unrelated female controls, and all of them were Caucasian (Kemppinen et al., 2011). Peripheral blood mononuclear cells (PBMCs) isolated from whole blood were tested on the Affymetrix Gene Chip Human Genome U133 Plus 2.0 Array (Kemppinen et al., 2011). In GEO series GSE43591, ten relapsing-remitting Caucasian MS patients and 10 matched healthy controls were recruited . Researchers isolated PBMCs from whole blood and sorted T cells with CD3+ positive magnetic beads. The transcriptional level was tested by Human Genome HG-U133 plus 2.0 arrays . We utilized GEO2R provided by the GEO website (GEO query along with limma R packages) to compute differential expression between case-control samples (Barrett et al., 2013). Then the transcript with the smallest p-value was selected among multiple transcript probes. After multiple testing correction (Benjamini-Hochberg method), the gene expression (transcript probes) with an adjusted p-value of <0.05 were considered to be significant.

MS Patients and Healthy Controls
Five patients diagnosed with relapsing-remitting MS as per the McDonald Criteria of MS were enrolled from Tianjin Medical University General Hospital. The exclusion criteria were as follows: (1) co-presence of other CNS disorders; (2) diagnosis of tumors, recent infection, or systemic hematologic diseases; (3) concomitant use of antineoplastic or immunemodulating therapies before blood sampling. Five healthy volunteers were also enrolled in this study as the control group. Tianjin Medical University General Hospital's Ethics Committee approved this study, and informed consent was obtained from each participant.

Cell Isolation and FACS Sorting
Peripheral blood samples were obtained from all MS patients at the acute phase of the disease as well as healthy volunteers. PBMCs were isolated from blood samples using Lymphocyte Separation Medium (Solarbio) followed by Ficoll density gradient centrifugation at 2,000 × g for 20 min. A single-cell suspension was then prepared for FACS sorting. The following antibodies labeled with FITC, PE, and APC were used in this experiment: human reactive CD3, CD4, and CD19 (BioLegend). FACS sorting was conducted using the FACSAria Cell Sorter (BD Biosciences).

Real-Time Quantitative PCR (RT-PCR)
TRIzol reagent (Invitrogen) was employed to isolate total RNA from sorted B lymphocytes according to the manual. The concentration of total RNA was quantified using a NanoDrop 1000 (Thermo Fisher Scientific). The total RNA was converted to cDNA with a Trans-Script First-Strand cDNA Synthesis SuperMix Kit (TransGen Biotech). The gene amplification was performed using the FastStart Universal SYBR Green Master (Roche). qPCR was run on the Opticon 2 Real-Time PCR Detection System (Bio-Rad). The primers used in this experiment were: HLA-DOB, F: CAGCTAAGGGCTCAGAAAGGAT and R: CTACTCATCACTACTTCAGGCTCCA; β-actin, F: AGCACAATGAAGATCAAGATCAT and R: ACTCGTCA TACTCCTGCTTGC.

The mRNA Expression of HLA-DOB Was Different Between MS Patients and Healthy Subjects
B lymphocytes were collected from the PBMCs of MS patients, as well as healthy controls and then sorted by FACS (Figure 1A). HLA-DOB's mRNA expression was upregulated in MS patients compared to healthy controls ( Figure 1B; p < 0.05).

DISCUSSION
Although more than 200 MS susceptibility loci have been identified, a better understanding of MS genetic risk factors is still needed. The gene-based tests are widely used complemental methods to the traditional SNP-GWAS. Herein, we selected an MS-GWAS dataset from IMSGC that contained 464,357 autosomal SNPs. We performed two gene-based tests of the MS-GWAS dataset using PLINK and VEGAS2. In total, we identified 28 significant shared genes with p-value of <4.56 × 10 −6 NSNP, number of single nucleotide polymorphisms. In the VEGAS2, the p-value from 1.00E-06 to 0 was interpreted as p < 10E-06.
using the Bonferroni correction. Ten of them had significant differential expression at least one series of MS GEO datasets with an adjusted p-value of <0.05. In our finding, ten shared MS risk genes have significantly differential expression. Among these genes, GALC and HLA-DOB showed two to threefold upregulation/downregulation in MS patients compared to controls. Moreover, TNFRSF1A and TNFSF14 genes both had significantly differential expression in PBMCs and PBTCs transcriptional data.
In gene expression analyses, GALC expressed differentially with a logFc of 1.21. Nearly 2.3-fold upregulation of GALC expression in MS patients suggests that it plays a vital role in MS's pathogenesis. The GALC gene provides instructions for encoding galactosylceramidase, a lysosomal enzyme that hydrolyzes galactolipids, such as galactosylceramide and psychosine (Lee et al., 2007). GALC SNPs are now known to be a significant cause of Krabbe disease (KD) (Wenger et al., 2000). KD is occasionally misdiagnosed as MS due to shared pathological features of myelin lesions (Wenger et al., 1997). The GALC gene is considered a susceptibility locus for MS. By using custom ImmunoChip arrays and replication analysis of MS-GWAS datasets, IMSGC identified 48 novel risk variants for MS, including GALC (rs74796499), with a joint p-value of 2.4E-20 (Beecham et al., 2013). In a study investigating cell-mediated immune mechanisms in MS, a variant of GALC (rs2119704) showed a combined p-value of 2.20E-10 (P dis = 3.50E-10, P rep = 2.50E-02) (Sawcer et al., 2011).
The lack of GALC may result in neurodegeneration independent of inflammation (Sawcer et al., 2011). Scott-Hewitt et al. explored the mechanism underlying mutant GALC-associated neurodegeneration in animals model of demyelination, in which wild-type (GALC +/+ ) and GALC ± mice were exposed to cuprizone (Kipp et al., 2009). GALC −/− mice were more defective compared to other genotypes (Scott-Hewitt et al., 2017). No significant behavioral or histological differences (e.g., in corpus callosal myelin) were observed between the two genotypes of animals at baseline. However, following demyelinating injury, GALC ± mice exhibited reduced remyelination and impaired myelin debris clearance, which might be caused by microglia dysfunction (Scott-Hewitt et al., 2017).
HLA-DO is a heterodimer formed by HLA-DOα and HLA-DOβ, encoded by HLA-DOA and HLA-DOB genes. HLA-DO PBMCs, peripheral blood mononuclear cells; PBTCs, peripheral blood T cells; logFc, log2-fold change; probe ID, transcript probes. The bold values of gene expression mean |logFc| > 1. Frontiers in Neuroscience | www.frontiersin.org is the human equivalent of murine H2-O, a non-classical MHC class II-like protein encoded in the class II region of the MHC (Trowsdale and Kelly, 1985). HLA-DO is only expressed in B cells and thymic epithelial cells (Douek and Altmann, 1997). B cells have been recognized as a contributor to the development of MS (Li et al., 2018). This study found that the HLA-DOB gene showed a twofold upregulated expression level in MS patients in contrast with healthy controls in PBMCs. To further verify the expression level of the HLA-DOB gene in B cells, we sorted B lymphocytes to compare the expression level of HLA-DOB in MS patients and healthy subjects. Moreover, the transcriptional level of the HLA-DOB gene was upregulated in MS patients in contrast with controls. HLA-DO is strongly associated with HLA-DM and DO-DM complexes. It is a critical regulator in intracellular transport and essential for the survival in lysosomal MHC class II compartments surrounded by highly acidic B cells (Liljedahl et al., 1996;van Ham et al., 2000). Nagarajan et al. (2002) showed that HLA-DO regulated peptide loading and antigen presentation in B cells. Early findings showed that DO inhibited DM to prevent DM from removing the invariant chain peptide CLIP (Denzin et al., 1997). However, some studies reported that DO-KO mice did not show reduced MHCII-CLIP levels (Perraudeau et al., 2000;Brocke et al., 2003). Also, DO, together with DM, would allow a more finetuned epitope selection (Poluektov et al., 2013). The HLA-DOB gene has also been identified as a genetic variant in many diseases, such as KD (Shendre et al., 2014), type 1 diabetes mellitus (Qiu et al., 2015), and rheumatoid arthritis (Jiang et al., 2016). Further investigations are needed to explore HLA-DOB's role in MS, especially in B cell antigenpresenting function.
Increasing evidence suggests that TNFSF14 may be an MS susceptibility gene. TNFSF14, also known as LIGHT, was identified as one of the risk loci for MS (rs1077667 G , p = 9.4 × 10 −14 ) (Sawcer et al., 2011). It may regulate the function of T cells via microRNA . Soluble LIGHT plays a protective role via inhibiting inflammation of the GG genotype of rs1077667. MS patients with this genotype showed the lowest LIGHT serum level (p = 0.02) compared to control subjects .

CONCLUSION
In summary, in an MS GWAS dataset, we performed genebased tests to get the intersection of MS risk genes, following the gene expression analysis that illustrated the gene characteristics in transcription level. Notably, the genes involved in antigen presentation and lysosomal function showed great differential expression in MS patients' samples compared to controls. Furthermore, our finding provided support for these candidate genes in the participation of MS mechanisms.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: www.ncbi.nlm.nih.gov/geo https://pubmed. ncbi.nlm.nih.gov/21833088/.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Ethical Committee of Tianjin Neurological Institute. The patients/participants provided their written informed consent to participate in this study.