Novel Variants Identified in Multiple Sclerosis Patients From Southern China

Background: Multiple sclerosis (MS) is an autoimmune and demyelinating disease. Genome-wide association studies have shown that MS is associated with many genetic variants in some human leucocyte antigen genes and other immune-related genes, however, those studies were mostly specific to Caucasian populations. We attempt to address whether the same associations are also true for Asian populations by conducting whole-exome sequencing on MS patients from southern China. Methods: Genomic DNA was extracted from the peripheral blood mononucleocytes of 8 MS patients and 26 healthy controls and followed by exome sequencing. Results: In total, 41,227 variants were found to have moderate to high impact on their protein products. After filtering per allele frequencies according to known database, 17 variants with the allele frequency <1% or variants with undetermined frequency were identified to be unreported and have significantly different frequencies between the MS patients and healthy controls. After validation via Sanger sequencing, one rare variant located in exon 7 of TRIOBP (Chr22: 37723520G>T, Ala322Ser, rs201693690) was found to be a novel missense variant. Conclusion: MS in southern China may have association with unique genetic variants, our data suggest TRIOBP as a potential novel risk gene.


INTRODUCTION
Multiple sclerosis (MS) is one of the chronic inflammatory demyelinating diseases of the central nervous system (1). Although the causes of MS are still largely unclear, it is thought to be the consequence of the interaction of genetic and environmental factors (1)(2)(3). Epidemiological studies have shown that MS is much less prevalent in East Asia than Europe and North America (4-7), which may be attributed to different genetic bases underlying the susceptibility to MS in various populations.
For the past several decades, many genetic variants have been found to be associated with MS in the western populations (8)(9)(10). However, lots of risk variants could not be validated in Asian populations (2,11). Furthermore, not only common variants but rare variants in protein-coding regions also play critical roles in the development of complex diseases (12,13). In this study, we determined the genetic differences between MS patients and healthy controls in southern China using whole-exome sequencing (WES) to find MS-associated variants in this area.

Patients and Samples
The study was reviewed and approved by both ethic committees of Sun Yat-sen Memorial Hospital of Sun Yat-sen University and University of Macau. Patients with MS were diagnosed according to 2010 revision of the McDonald criteria (14) and were recruited from Jan. 1, 2016 to Dec. 31,2016 in Sun Yat-sen Memorial Hospital. Healthy volunteers who were ageand gender-matched to the MS patients and had no history of autoimmune disorders, tumors, and other chronic illness such as hypertension and diabetes were also recruited. All MS patients and healthy volunteers were southern Chinese and included into the subsequent analysis after they had completed a written informed consent. Patients with underdetermined diagnoses, previous history of transplantation, previous plasmapheresis or stem cell therapy were excluded from the study. Peripheral whole blood was collected for preparing DNA libraries. Genomic DNA was extracted from peripheral blood mononucleocytes of each individual patient or healthy donor with a DNA extraction kit (Tiangen Biotech). Then, genomic DNA was sonicated into fragments of around 200 bp for preparing DNA libraries. DNA libraries for Illumina sequencing were constructed using the NEB Ultra II DNA Prep Kit (NEB) followed by quality verification via DNA High-Sensitivity Bioanalyzer assay (Agilent).

Whole-Exome Sequencing and Confirmation via Sanger Sequencing
Exome capture was performed with the TruSeq Rapid Exome Prep kit (Illumina). Multiple individual DNA libraries with different Illumina sequencing indexes were pooled together for exome capture and subsequent sequencing. Exon-enriched DNA library pools were sequenced by Illumina HiSeq 2500 platform (15). FastQC (16) tool was used to check the quality of raw fastq data generated by sequencing. The raw sequencing reads were aligned to the Ensembl (release 84) Human Genome Build 38 (GRCh38.p5) (17) using the Burrows-Wheeler Alignment tool (18). Reads which mapped to multiple locations were marked as duplicate using Picard tool (BroadInstitute) 1 for future filtering during variant calling. Local realignment of reads around indels and base quality score recalibration was performed using the IndelRealigner and BaseRecalibrator modules of the Genome Analysis Toolkit (GATK) (19,20). Finally, variants were called using HaplotypeCaller module and variant quality scores were recalibrated with VariantRecalibrator module of GATK.
True variants were identified only when the sequencing read depth of the mutation was equal to or more than 10 reads. SnpEff (21) was used for the general variant annotation purpose, and SnpSift (22) was used to add additional annotation from the 1000 Genomes Project (1KGP) (23), Exome Aggregation

Statistical Analysis
Variant frequencies were described as proportions, and comparisons of frequencies between different groups were analyzed using Fisher's exact tests. Two-tailed P < 0.05 was used as statistically significance.

Characteristics of Samples
Eight MS patients and 26 healthy controls were included in the study. All the eight patients met the McDonald criteria (2010 revision) for MS, six of them were diagnosed as relapsingremitting MS (RRMS) who had more than one clinical attack and had one or more typical lesions in two or more typical MS-affecting areas of the central nervous system (CNS), and the other two patients diagnosed as clinically isolated syndrome with more than one objective lesion in two or more typical MSaffecting areas and oligoclonal bands in cerebral spinal fluid. All the patients and healthy volunteers came from several provinces in southern China including Guangdong, Hunan, Guangxi and Fujian, which are within 1,000-km away from the hospital. All the healthy controls had no relative relationship with the patients.

Rare Variants Revealed via Whole-Exome Sequencing
On average, 2 × 37,614,484 pair-end reads (100 bp of read length) were generated after raw sequencing data filtering, mapping, and realignments, and per-base coverage of target exome regions was around 167 × per person. About 87.4% of exomes had sequencing depth > 1×, and 58.2% of exomes had sequencing depth > 10×. After annotations for all variants according to SnpEff database, a total of 41,227 variants which may have moderate or high impact on their translation products were found in the MS patients and healthy controls. We then filtered out common variants and identified 18,030 rare variants (with allele frequency <1% referring to both 1KGP/ExAC and East Asian populations of 1KGP/ExAC) or variants with undetermined allele frequency. Among the 18,030 variants, 17 variants (located in 15 genes) were identified with allele frequencies significantly different between the MS patients and healthy controls (P < 0.05) (Table 1, Figures 1, 2).

A MS-Associated Missense in TRIOBP
Among the 17 variants (possibly associated with MS) identified in our study, 6 were known rare variants. We validated the variants by Sanger sequencing (Figure 3), and found that one of the variants in the gene TRIOBP had significantly different allele frequency between the MS patients and healthy controls (P < 0.05).
The variant is a missense (Chr22: 37723520G>T, Ala322Ser, rs201693690) located in the exon 7 of TRIOBP, causing an amino acid substitution (Ala322Ser). Among the eight MS patients, three had heterogeneous variants in Chr22: 37723520 (G/T). It has much higher allele frequency in the MS patients than the healthy controls (18.75 vs. 0%, P = 0.0112). This variant has been documented in the ExAC study, and found to be associated with deafness and schizophrenia. The frequency of allele T in Chr22:37723520 is only 0.047% in the total population of the ExAC database, indicating that it is a rare variant in human genome (Figure 2).

DISCUSSION
In this study, we recruited 8 MS patients and 26 healthy controls in southern China. Due to the low morbidity of MS in southern China, we could not recruit more patients within the two-year study period. Following WES, we found  17 rare variants or variants with unknown allele frequency had significantly different allele frequencies between the MS patients and healthy controls, including a missense in TRIOBP (Chr22: 37723520G>T, Ala322Ser, rs201693690). TRIOBP may be a novel risk gene among southern Chinese.
The association between gene polymorphisms and MS has been investigated for decades. The most studied polymorphisms with MS are the human leukocyte antigen (HLA) genes. It has been found that MS risk is strongly associated with HLA-DRB1 * 15:01 and the related haplotype HLA-DRB1 * 15:01-DQA1 * 01:02-HLA-DQB1 * 06:02 in Caucasian populations such as Europeans and North Americans (2,26). However, it is difficult to associate other HLA-related risk alleles with MS because strong linkage disequilibrium is present in the HLA-DRB1-DQA1-DQB1 regions (2). It is also difficult to confirm risk alleles in genes unrelated to HLA since their genetic effects may be much smaller than that of HLA-DRB1 * 15:01 (2). Furthermore, the associations of HLA-DR allele polymorphisms with MS found in Caucasian populations do not consistently apply to the Asian populations including Chinese. For example, HLA-DRB1 * 15:01 is not strongly associated with MS in Asian populations (11). The association of HLA-DRB1 * 15:01 with MS risk was weaker among northern Chinese than Caucasians, whereas no association of HLA-DRB1 * 15:01 with MS risk was found in southern Chinese (27). These findings indicate that polymorphisms in HLAunrelated genes might be associated with MS among southern Chinese.
Recently, several genome-wide association studies (GWAS) with large numbers of samples had identified 57 non-HLA SNPs to be associated with MS (10). However, the populations involved in the study were mainly Caucasian populations. Another GWAS study from an Australian group showed that genetic loci for Epstein-Barr virus nuclear antigen-1 were positively associated with the risk of MS (28). However, the association cannot be found in Asian populations (29). These results suggest that MS-related genetic heterogeneity exists in different ethnic populations, and new studies on Asian populations are needed to identify genetic bases related to the development of MS in Asia. We performed the exome-sequencing study to discover whether any MS-associated variants differ between southern Chinese and Caucasians.
WES is one of the powerful next-generation sequencing techniques to explore global genetic variants in Mendelian inheritance disorders and many other complex diseases (30)(31)(32). It allows discovery of rare variants in coding sequences that may cause mutations in their protein products and subsequent disease phenotypes (31,32) or contribute to heritability of complex traits. In this study, WES has revealed that the frequency of the rare variant in TRIOBP is significantly different between the MS patients and healthy controls, and its allele frequency is much higher than that of reference in known databases (18.75 vs. 0.047% in ExAC). And subsequent Sanger sequencing validated the variant in our samples. Although our sample size is very small, the much higher allele frequency in the MS patients than the healthy controls suggests that this variant may be associated with the risk of MS development in southern China.
This variant, which has not been identified in any MS GWAS studies, may play an important role in the development of MS among southern Chinese. TRIOBP is a gene encoding TRIO and F-actin binding protein, and locates in chromosome 22. The encoded protein interacts with trio, which regulates actin cytoskeleton organization, cell migration and cell growth, and it also stabilizes F-actin structures (33,34). Previous studies have found that mutations or variants of TRIOBP cause genetic sensorineural hearing impairments (35,36), and abnormal TRIOBP protein aggregation leads to chronic mental illness (37, 38). However, the mechanism for how TRIOBP affects immune or inflammation is unknown.
Our exome-sequencing also identified several variants in the other genes which have higher allele frequencies in the MS patients than the normal controls. However, there is no referable information for the allele frequency for some of the variants (e.g., CKAP4, NUMBL, and VPS13B). The other variants such as the missense in HLA-DQA2 (Chr6: 32745313T>A, rs752313403), which had much higher allele frequency in the MS patients than the healthy controls, could not be validated through Sanger sequencing. Thus, we cannot include it as a risk variant.
Furthermore, other etiologic factors can trigger MS or increase the susceptibility to MS. For example, it has been proposed that HLA2TA mRNA level can be reduced by the active replication of human herpesvirus 6 in MS patients (39,40). It is intriguing to determine whether these external etiologic factors are differentially distributed among various populations. Simultaneous testing of both genetic and environmental etiologic factors may further elucidate the correlations.
In conclusion, this study on MS patients from southern China has identified a missense rare variant in TRIOBP (Chr22: 37723520G>T, Ala322Ser, rs201693690) that may be associated with MS. Further study is necessary to verify the above findings in a larger sample size, and animal models are needed to confirm the role of this and other potential variants.

AUTHOR CONTRIBUTIONS
HW contributes to recruit study participants, collect samples, prepare exome sequencing samples, and write manuscript. LP contributes to analyze exome sequencing samples. XR contributes to recruit study participants and collect samples. EL contributes to prepare exome sequencing samples. KW contributes to prepare exome sequencing samples and analyze exome sequencing samples. YP contributes to supervise whole project, design the study, recruit study participants, collect samples, and write manuscript. R-HX contributes to supervise whole project, design the study, prepare exome sequencing samples, and write manuscript.