The Clinical and Polynucleotide Repeat Expansion Analysis of ATXN2, NOP56, AR and C9orf72 in Patients With ALS From Mainland China

Background Repeat expansions, including those in C9orf72 and ATXN2, have been implicated in amyotrophic lateral sclerosis (ALS). However, there have been few studies on the association of AR and NOP56 repeat expansion with ALS, especially in China. Accordingly, we aimed to evaluate the frequency of C9orf72 and ATXN2 repeat mutations and investigate whether NOP56 and AR repeat expansion are risk factors for ALS. Methods In this study, 736 ALS patients and several hundred healthy controls were recruited. Polymerase chain reaction (PCR) and repeat-primed PCR (RP-PCR) were performed to determine the repeat lengths in C9orf72, ATXN2, AR, and NOP56. Results GGGGCC repeats in C9orf72 were observed in six ALS patients (0.8%, 6/736) but not in any of the controls (0/365). The patients with pathogenic GGGGCC repeats showed shorter median survival times than those with a normal genotype (p = 0.006). Regarding ATXN2 CAG repeats, we identified that intermediate repeat lengths (29–34 copies) were associated with ALS (p = 0.033), and there was no difference in clinical characteristics between the groups with and without intermediate repeats (p > 0.05). Meanwhile, we observed that there was no association between the repeat size in AR and NOP56 and ALS (p > 0.05). Conclusions Our results demonstrated that pathogenetic repeats in C9orf72 are rare in China, while intermediate CAG repeats in ATXN2 are more frequent but have no effect on disease phenotypes; the repeat size in AR and NOP56 may not be a risk factor for ALS.


INTRODUCTION
Amyotrophic lateral sclerosis (ALS) is a progressive neurodegenerative disease that affects upper and lower motor neurons and results in muscular weakness and atrophy (1). To date, mutations in more than 50 genes have been associated with the pathogenesis of ALS (2). Notably, nucleotide repeat expansion mutations in part of these genes play a pivotal part in the pathogenetic process. For example, the pathogenic GGGGCC hexanucleotide repeat expansion (HRE) in the chromosome 9 open reading frame 72 (C9orf72) gene was identified as a common causative factor for ALS in Caucasian populations, accounting for 23.5-47% of familial ALS (3,4) and 4.1-21.0% of sporadic ALS cases (5). However, this mutation was very rare in mainland China and other Asian countries (6)(7)(8), with a very low frequency of only 0%-0.3%. As early as 2010, intermediate-length CAG repeats (27-33 copies) in the ATXN2 gene were reported as a genetic risk factor for ALS (9). Subsequently, many studies have validated this result in many other ethnicities (10)(11)(12)(13)(14).
Spinal and bulbar muscular atrophy (SBMA), a neuromuscular disease that shares several clinical features with ALS, is caused by CAG trinucleotide repeat expansions (>38 copies) in the androgen receptor (AR) gene (15). A functional study showed that AR was downregulated in the spinal cord in male SOD1 G93A mice, suggesting that AR may play an important role in the pathogenesis of ALS (16). Another nucleotide repeats expansion disease, spinocerebellar ataxia type 36 (SCA36), caused by GGCCTG hexanucleotide repeat expansion in the NOP56 gene, also presents an ALS phenotype (17,18). Subsequent studies revealed motor neuron involvement during the course of SCA36, including reduced expression of NOP56, TDP-43, and FUS in the large motor neurons of an ALS mouse model, which occurred before the apparent onset of the disease (19). Nonetheless, few studies have been conducted to date on the association between ALS and the size of nucleotide repeats within AR or NOP56.
Therefore, in this study, we aimed to determine (i) the frequency of C9orf72 and ATXN2 mutations; (ii) whether NOP56 and AR are risk genes for ALS; and (iii) the potential associations between phenotypes and the size of nucleotide repeats within the C9orf72, ATXN2, NOP56, and AR genes.

Population
ALS patients were enrolled from the Department of Neurology, Xiangya Hospital, Central South University (CSU), in either an outpatient or an inpatient setting from April 30, 2013, to November 30, 2020. All patients were diagnosed by at least two experienced senior neurologists and diagnosed with clinically definite, probable, or probable laboratory-supported ALS according to the revised El Escorial criteria (20). Among

Genetic Analysis
Genomic DNA from all participants was extracted from peripheral blood using a standard extraction method.
To detect the size of repeat expansions in the C9orf72 and NOP56 genes, we first applied the polymerase chain reaction (PCR) method as described previously (8,18). The target sequences covering the hexanucleotide repeats of the C9orf72 and NOP56 genes were amplified with different pairs of fluorescently labeled primers. The fluorescent PCR product was analyzed using an ABI-Prism 3730 Genetic Analyzer, and the data were examined using GeneMapper software (Applied Biosystems, Vernon Hills, Illinois, USA). Next, if the result showed a homozygous peak, we reanalyzed it using repeat-primed PCR (8,18). Expansions of the hexanucleotide repeat showed a typical sawtooth pattern.
Genotyping of ATXN2 and AR was performed by PCR amplification of CAG tracts in combination with capillary electrophoresis; the analysis was performed using GeneMarker software as described previously (21,22).

Statistical Analysis
For CAG repeats in ATXN2, we used the Fisher's exact test to assess the association between the intermediate CAG repeats and ALS. To identify the relationship between the repeat size in AR and NOP56 and ALS, we used one-sided Mann-Whitney and outlier analysis, the same statistical methods as the tool ExpansionHunter DeNovo (23).
In ALS patients, the chi-square test, Spearman's or Fisher's exact test (for categorical variables) and the Mann-Whitney U test, Pearson's tests or Student's t-test (for continuous variables) were used to measure the association between polynucleotide repeat expansion and different clinical characteristics, such as age at onset (AAO), sex, site of onset, family history, Medical Research Council (MRC) score, Amyotrophic Lateral Sclerosis Functional Rating Scale-Revised (ALSFRS-R) score and disease progression rate (DPR; calculated as DPR = [48-ALSFRS-R score at time of diagnosis)/disease duration (months)] (24).
Kaplan-Meier univariate analysis was used to determine the effect of polynucleotide repeat expansion on survival time.
Statistical analyses were performed using IBM Statistical Package for the Social Sciences (SPSS) version 22. Differences with P < 0.05 were considered statistically significant.

Clinical Data
A total of 736 ALS patients (679 with sporadic ALS and 57 with familial ALS) were recruited, including 486 males and 250 females. In our ALS cohort, the mean AAO was 52.44 ± 11.81 years; the percentage of patients exhibiting spinal onset (77%, 567/736) was higher than the percentage of patients with bulbar onset (22.9%, 169/736). Among 569 patients with ALSFRS-R data available, the mean score was 37.64 ± 7.95, and the mean DPR was 0.96 ± 0.97; among 663 patients with MRC scores available, the mean score was 107.97 ± 17.60.

Genetic Results and Clinical Features of ALS Patients With C9orf72 Mutations
The distribution of GGGGCC HRE length is shown in Figure 1A. Among 736 ALS patients, the GGGGCC HRE length ranged from two to more than 60 copies. The most frequent size of allele was two repeats, accounting for 37.3% of cases. Notably, we identified HREs over 30 repeats long in six ALS patients, accounting for 0.8%, but not in any of the controls ( Table 1). Among the 365 healthy controls, the HRE length ranged from two to 15 copies; as in the patient group, the most common repeat length was two copies, accounting for 42.7% of cases. No intermediate-length HRE repeats (24)(25)(26)(27)(28)(29)(30) were identified in ALS patients or controls.
The clinical features of the patients with HRE length ≥30 copies are shown in Table 2. When comparing the clinical characteristics of ALS patients with different ranges of HRE length, we found that patients with HRE length ≥30 copies had a shorter median survival time than those with < 30 copies (p = 0.006). Although there was no significant difference in sex, AAO, ALSFRS-R, site of onset, family history, MRC or DPR (p > 0.05), we noted that ALS patients with HRE length ≥30 copies tended to have a younger mean AAO (46.25 ± 2.22 years) than those with <30 copies (52.51 ± 11.75 years). Among the six ALS patients with ≥30 copies, two patients had a family history. One of the families had been reported in our previous study (8). In another family (Figure 2A), the proband was a 44-year-old man who initially presented progressive muscle weakness in the bilateral upper limbs. Thereafter, he progressively manifested muscle weakness and atrophy in all four extremities, along with dysphagia and dysarthria. On admission 12 months after onset, neurological examination revealed fasciculations, muscle weakness and hyperreflexia in the affected limbs. Electromyography (EMG) showed abundant and diffuse ongoing denervation and chronic reinnervation  The patient has been reported in a previous study. ‡ The patient was dead at the time of the study. § Right peroneal nerve motor conduction did not produce a positive waveform. ¶ Lost to follow-up.

Genetic Results and Clinical Features of ALS Patients With Intermediate CAG Repeats in ATXN2
As shown in Figure 1B, the CAG repeat length ranged from 17 to 37 in 736 ALS patients and from 17 to 26 in 201 controls. The most frequent allele size was 22 copies, accounting for 91.0 and 90.5% of ALS patients and controls, respectively. Herein, according to previous reports and a meta-analysis of ATXN2 repeat lengths and ALS risk (31), we applied cutoff values  Table 3, Supplementary Table 1). When comparing clinical characteristics between patients with intermediate repeat lengths and those with normal repeat lengths, we found that there was no significant difference in sex, AAO, ALSFRS-R, site of onset, MRC, familial history or DPR (p > 0.05). Using the Kaplan-Meier method, we did not find a significant difference in survival time between the two groups (p > 0.05).
We found two ALS patients carrying pathogenic CAG repeat expansions. Patient A0154, who carried 37 repeats, was a 52-yearold female with a family history of SCA ( Figure 2B). She initially developed gait ataxia at 43 years old. Six years later, she gradually showed muscle weakness and atrophy of the bilateral lower limbs. At the age of 51 years, a neurological examination revealed obvious ataxia; the Romberg sign, finger-to-nose test, rapid alternating movements, and heel-to-shin test were positive. She also showed UMN signs such as a positive palmomental reflex and positive Babinski sign, as well as lower motor neuron (LMN) signs such as fasciculations, muscle atrophy and decreased pharyngeal reflex. EMG showed abundant and diffuse ongoing denervation and chronic reinnervation changes. Notably, the patient also clearly exhibited memory impairment. Her MMSE score was 18, and her Edinburgh Cognitive and Behavioral ALS Screen (ECAS) score was 42, suggesting cognitive impairment Frontiers in Neurology | www.frontiersin.org (executive dysfunction), memory impairment, and behavioral impairment (apathy, disinhibition, and loss of sympathy). Brain magnetic resonance imaging (MRI) revealed atrophy of the pons, cerebellum, and temporal lobe ( Figure 3A). The patient was clinically diagnosed with SCA combined with probable ALS and behavioral-variant FTD (bvFTD). However, her mother presented symptoms of ataxia without any symptoms of ALS or FTD at 53 years old and died 20 years after onset.
Patient A002447, a 64-year-old male with a repeat length of 37, was from a family with no history of neurodegenerative diseases. He presented with dysarthria at the age of 62 years and then gradually developed weakness and atrophy in the right upper limb, which subsequently spread to the lower limbs in 6 months. Deep tendon reflexes were brisk, and the palmomental reflex was present. His MMSE (26/30), MoCA (16/30) and ECAS (50/136) revealed memory impairment and language dysfunction (difficulties in word finding and word comprehension). His brain MRI showed cortex atrophy, especially in the frontal and temporal lobes ( Figure 3B). EMG showed abundant and diffuse ongoing denervation and chronic reinnervation changes. As mentioned above, he was clinically diagnosed with probable ALS together with semantic dementia.

Genetic and Clinical Features of ALS Patients With Different CAG Repeats in the AR Gene
The sizes of CAG repeats in AR ranged from 9 to 39, with 22 repeats as the most common allele (17.9%) in 736 ALS patients; among the 225 healthy controls, the sizes ranged from 11 to 34, with 22 repeats as the most common allele (20.4%) (Figure 1C). In the male patients and controls, the repeat sizes of the long allele between the ALS and healthy control groups showed no significant difference (n = 610, Mann-Whitney U test, p = 0.315). Meanwhile, we found no association between the repeat sizes of the long allele and the risk for ALS in female patients and controls (n = 351, Mann-Whitney U test, p = 0.077). In addition, according to the outlier analysis, we applied 24 repeats and 28 repeats as the cutoff value in male and female patients, respectively. Statistically, CAG repeats in AR were not associated with ALS for each gender (Chi-square test, p > 0.05) ( Table 4).
Next, regarding the association between CAG repeat size in AR and the ALS phenotypes, we found that repeat size did not affect the AAO, location of onset, family history, ALSFRS-R, MRC, DPR or survival time of ALS patients (p > 0.05).

Genetic and Clinical Features of ALS Patients With Different GGCCTG Repeat Sizes in the NOP56 Gene
The sizes of the GGCCTG repeats in NOP56 ranged from three to 11 in 736 ALS patients and from three to nine in 229 controls ( Figure 1D). Firstly, we used Mann-Whitney U test to test whether the repeat sizes of the long allele are associated with ALS. However, the repeat sizes of the long allele between the ALS and healthy controls showed no significant difference (n = 965, Mann-Whitney U test, p = 0.069). Meanwhile, we applied nine as the cutoff value according to the outlier analysis, and a repeat  size above this value was not a risk factor for ALS (Chi-square test, p > 0.05) ( Table 5). In addition, we found no significant associations in sex, AAO, family history, ALSFRS-R, site of onset, MRC or DPR between these two groups (p > 0.05). Moreover, we did not find any correlation between repeat length and survival time (p > 0.05).

DISCUSSION
In this study, we systematically evaluated polynucleotide repeats in C9orf72, ATXN2, AR, and NOP56 in a large Chinese ALS cohort and healthy controls. We found that six of 736 ALS patients (0.8%) carried a GGGGCC HRE in C9orf72. This frequency was similar to the rates observed in other Asian countries (0-4.7%) (25, 26) but was much lower than those in European populations (4.1-47%) (27). In addition, many studies have shown that patients carrying pathogenic HREs have phenotypic differences from those with normal HREs (4,(28)(29)(30). Patients carrying pathogenic HREs showed a higher rate of bulbar onset, earlier AAO, shorter survival, faster forced vital capacity (FVC) value decline, and a higher incidence of comorbid FTD and/or family history of dementia than those with normal genotypes (3,32,33). However, in our cohort, we found that patients with pathogenic HREs had a reduced median survival time, which was consistent with previous studies in China (6). Our result may be related to the low frequency of C9orf72 HRE expansion mutations; large samples are needed to elucidate the potential associations between HREs in C9orf72 and disease phenotypes. With regard to the clinical phenotypes of the six ALS patients with pathogenic HREs in C9orf72, only two presented memory impairment and abnormal behavior, while the remaining four showed pure ALS without memory impairment, which emphasizes the need to test for HREs in C9orf72 in ALS patients with pure motor neuron signs. To our knowledge, the normal CAG repeat size in ATXN2 is 15-29 (34) and the different ranges of CAG repeat numbers contribute to different phenotypes (Figure 4), such as SCA2 (repeats >34), ALS (repeats: [29][30][31][32][33][34] and Parkinsonism (repeats: [34][35][36][37][38][39][40][41][42][43][44][45][46][47][48][49]. To date, many previous studies have proposed that intermediate CAG repeats in ATXN2 are associated with ALS (22,31,(35)(36)(37). However, it is challenging to identify the risk range for ATXN2 CAG repeat in ALS, and few studies reported a same risk range.  (39). In this study, we identified a cutoff value of 29 and found that intermediate repeats ranging from 29 to 34 were significantly associated with ALS. In addition, two studies have been suggested that shorter intermediate repeats may be a protective factor for ALS (31,39), but there is still no consensus.
In SCA2, a strong inverse correlation between AAO and repeat length was observed (40). Other studies have indicated that patients with CAG repeat length >24 have an earlier AAO, but this was no significant in Chinese individuals (12,34,41). In our study, the patients with expanded CAG repeat length showed different manifestation. We found no association between clinical features and intermediate CAG repeat, which was in line with previous reports (31,36,(41)(42)(43)(44)(45). We therefore propose that intermediate-length ATXN2 repeats may be involved in ALS pathogenesis, but whether they influence the phenotypes of ALS remains to be determined.
In addition, we reviewed the clinical features of patients carrying intermediate CAG repeats (Supplementary Table 2). When comparing the patients with pathogenic CAG repeats (Supplementary Table 3), most patients with intermediate repeats in previous studies showed pure sporadic ALS or ALS with cognitive impairment, but few cases presented with ataxia (13,31,(45)(46)(47). Moreover, in previous studies, patients with intermediate CAG repeats had an increased risk of combining with cognitive impairment or an FTD phenotype (43,48). However, this phenomenon was not found in our 14 ALS patients with intermediate repeats. This discrepancy may arise from a difference in ethnic backgrounds. Additionally, some ALS patients were reported to carry full pathogenic CAG repeats (>34), and we also found two such ALS patients in our cohort. In summary, patients' clinical manifestations vary greatly (Supplementary Table 3): most patients carrying pathogenic repeats have a family history of SCA2 or ALS; a small portion of patients show pure ALS without ataxia; a few patients first present ataxia and gradually develop the ALS phenotype a few years after onset.
The genetic and clinical overlaps between ALS and SCA2 may arise from the common pathogenetic mechanisms of these two diseases. Many studies have found that CAG repeats in ATXN2 specifically alter the secondary structure of the RNA, leading to toxic gain of function at the RNA level, which is also implicated in the pathogenesis of ALS (31). However, previous studies showed ALS patients have CAA interruption in the CAG repeat region (34), whereas SCA2 patients have pure CAG repeat regions (49). Pure CAG repeats form slippery hairpins, whereas the CAA interruptions determine the folding of the ATXN2 transcript into branched hairpins which may hamper the interaction with double strand RNA-binding protein (34). This date may partly explain why ATXN2 CAG repeats cause two different phenotypes. Meanwhile, TDP-43-positive inclusion bodies are the typical pathological manifestation of ALS. Many studies have found that ATXN2 can directly interact with TDP-43 and that expanded CAG repeats in ATXN2 can promote cytoplasmic mis-localization of TDP-43 by enhancing C-terminal cleavage (9,50). Further functional studies are needed to elucidate the pathogenic mechanisms of intermediate and pathogenic CAG repeats in the ATXN2 gene in ALS.
Recently, a study showed that AR antagonists could accelerate disease onset in male SOD1 G93A ALS mice, leading to exacerbated muscle pathology (16), which indicates that AR may be implicated in the pathogenesis of ALS. However, we did not discover an association between CAG repeats in AR and ALS or its phenotypes in our cohort. In consisted with our results, many previous studies did not find an association between CAG repeats in AR and ALS either, even in other tissues, such as spinal cord and brain. Accordingly, it seems plausible to speculate that the lengths of CAG repeat in AR may not be a risk factor for ALS. More robust independent studies are still warranted to confirm this hypothesis. Similar to ATXN2, NOP56 is another causative gene of the SCA subtype, SCA36. Clinically, some SCA36 patients manifest the ALS phenotype, especially in cases reported in Japan. Genetically, some ALS patients carry pathogenic GGCCTG repeats in NOP56 (18), suggesting an overlap of genetic and clinical manifestations between ALS and SCA36 (51,52). Additionally, one study showed a progressive reduction in the NOP56 protein level in the large motor neurons of the SOD1 G93A ALS mouse model (19). However, our results found that the size of the GGCCTG repeat expansion in NOP56 was not associated with ALS when a cutoff value of ≥9 was used and there was not a significant difference in the distribution of CAG repeats between patients and controls, which was consistent with the results of other studies (53,54). In general, although ALS and SCA36 have similar clinical phenotypes, repeat expansions in NOP56 might not be a risk factor for ALS. Thus, further understanding the normal functions of these genes will provide insight into their role in disease.
In conclusion, HREs in C9orf72 are rare in ALS patients in mainland China, and GGCCTG repeats in NOP56 and CAG repeats in AR may not be associated with an increased risk of ALS. Meanwhile, intermediate-length repeat expansions in ATXN2 are more frequent in China, suggesting shared underlying etiological factors among neurodegenerative diseases. Our study has limitations. First, the number of the controls is less than half of ALS patients, which may affect the results of the statistical analysis. Second, we do not have a single control homogeneous group. Third, we did not evaluate the CAA interruptions in the CAG repeat region, which maybe significant to clarify the clinical manifestations of the two patients carrying 37 repeats.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Ethics Committee and the Expert Committee of Xiangya Hospital, Central South University. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
XH and JW designed and conceptualized the study, analyzed the data, interpreted the data, drafted, and revised the manuscript. WL and PL play a major role in the acquisition of data and analyzed bioinformatic data. ZL, YY, and JN play a major role in the acquisition of data. LS and BT designed and conceptualized the study, analyzed the clinical data for diagnosis and differential diagnosis, and revised the manuscript. All authors contributed to the article and approved the submitted version.