Distribution of the C9orf72 hexanucleotide repeat expansion in healthy subjects: a multicenter study promoted by the Italian IRCCS network of neuroscience and neurorehabilitation

Introduction High repeat expansion (HRE) alleles in C9orf72 have been linked to both amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD); ranges for intermediate allelic expansions have not been defined yet, and clinical interpretation of molecular data lacks a defined genotype–phenotype association. In this study, we provide results from a large multicenter epidemiological study reporting the distribution of C9orf72 repeats in healthy elderly from the Italian population. Methods A total of 967 samples were collected from neurologically evaluated healthy individuals over 70 years of age in the 13 institutes participating in the RIN (IRCCS Network of Neuroscience and Neurorehabilitation) based in Italy. All samples were genotyped using the AmplideXPCR/CE C9orf72 Kit (Asuragen, Inc.), using standardized protocols that have been validated through blind proficiency testing. Results All samples carried hexanucleotide G4C2 expansion alleles in the normal range. All samples were characterized by alleles with less than 25 repeats. In particular, 93.7% of samples showed a number of repeats ≤10, 99.9% ≤20 repeats, and 100% ≤25 repeats. Conclusion This study describes the distribution of hexanucleotide G4C2 expansion alleles in an Italian healthy population, providing a definition of alleles associated with the neurological healthy phenotype. Moreover, this study provides an effective model of federation between institutes, highlighting the importance of sharing genomic data and standardizing analysis techniques, promoting translational research. Data derived from the study may improve genetic counseling and future studies on ALS/FTD.


Introduction
Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disorder characterized by motor impairment with progressive paralysis and cognitive and/or behavioral changes.Death typically occurs in approximately 3 years from the onset of respiratory insufficiency.Familial forms account for approximately 10% of cases, showing an autosomal dominant with reduced penetrance transmission pattern, while the remaining cases are sporadic.Yet, approximately 10% of patients affected by apparently sporadic ALS carry a mutation in one of the genes associated with familial ALS (SOD1, FUS, TDP-43, and C9orf72) (1,2).Frontotemporal dementia (FTD) is a group of neurodegenerative diseases characterized by the progressive impairment of behavior, language, and cognitive functions.The typical onset occurs before 65 years of age, with death within 7 years.A positive family history is described in approximately 25-30% of cases, and the frequency of genetic mutations involved in the inheritance of FTD is different across populations (3).Scientific research in the last decades has improved the genetic characterization of both ALS and FTD.Clinical studies supported by neuropathological and genetic evidence have demonstrated the etiopathological continuum between these disorders, sharing pathogenic mechanisms and common genetic signatures.In 2011, two independent studies (4,5) identified the hexanucleotide G 4 C 2 expansion in the non-coding region between exons 1a and 1b of the C9orf72 gene as the molecular key player of the FTD/ALS complex phenotype (6).Subsequently, several studies confirmed this association, revealing that carriers of a high-repeat expansion (HRE) allele in C9orf72 develop ALS and/or FTD with variable clinical expression and age-dependent penetrance (7).Genotyping of the hexanucleotide repeat expansion in C9orf72 is recommended in patients with a positive family history of ALS, FTD, or both (7).Interestingly, the revaluation of sporadic ALS and FTD patients demonstrated that many subjects carried a hexanucleotide repeat expansion in C9orf72 (1,4,5).In this scenario, it is possible that many familial cases are still unrecognized, maybe for Gompertzian inter-disease competition (8) or other environmental and clinical conditions.In fact, it is well known that approximately 35% of C9orf72 patients have an atypical presentation mimicking other neurodegenerative disorders (Parkinson's disease, Huntington's disease, Lewy body dementia, Alzheimer's disease, and parkinsonism) that can lead to misdiagnosis (3,9).It is noteworthy that in certain European populations, the hexanucleotide repeat expansion in C9orf72 has been identified as the most prevalent cause of phenocopies resembling Huntington's disease (HD) (10).In line with this, the German Neurological Society has officially recognized C9orf72 expansion alleles as a primary HD phenocopy in their guidelines for the differential diagnosis of chorea (11).Furthermore, the phenotypes associated with C9orf72 can vary significantly, even within the same family lineage, manifesting in diverse neurodegenerative presentations (12).
Behind the clinical difficulties in the recognition of C9orf72associated phenotypes, a lack of definition of hexanucleotide repeat cutoff ranges further complicates the diagnosis and interpretation of genotypes.As it stands, there is no universally shared consensus defining the thresholds for normal and pathological C9orf72 alleles.Various research studies and commercial laboratories have presented differing reference ranges for normal, intermediate, and expanded/ pathological C9orf72 hexanucleotide alleles (13,14).The range for normal alleles varies from fewer than 20 to fewer than 30 repeat units (7), while pathological alleles are reported to range from over 23 to more than 45 repeat units (7,15,16).This variance leads to a scenario where an individual with a C9orf72 allele containing between 23 and 30 repeats could be classified as carrying either a wild-type or a pathological allele, depending on the laboratory's adopted threshold.This lack of standardization in C9orf72 testing thresholds may lead to significant confusion and misinterpretation.Moreover, even in the era of next-generation sequencing, accurately sizing and interpreting repetitive genetic variants remains a significant challenge.This complexity is often due to the reliance on less common analytical methods, many of which are based on homemade protocols and vary considerably across different diagnostic centers.While the concept of method harmonization is straightforward and necessary, practical experience underscores numerous challenges in standardizing technological, methodological, and interpretative steps across these centers.As a result, different analytical methods are used to estimate expanded G 4 C 2 repeats, often with limited confidence (13,14).This uncertainty is exacerbated by factors such as the high GC content, large size, somatic instability, repetitive nature of the flanking sequences, and the presence of sequence variations at the 3′ end of the region (17,18).
In an effort to standardize the genetic testing and interpretation of results for C9orf72, the Italian IRCCS Network of Neuroscience and Neurorehabilitation (RIN) conducted a multicenter study.RIN, being the largest federation of Scientific Institutes for Research, Hospitalization, and Healthcare (IRCCS) in Italy with a focus on neuroscience, offers nationwide access to medical genetic data for translational research in compliance with the EU -General Data Protection Regulation.Established in 2017 by the Italian Ministry of Health, RIN aims to foster collaboration among IRCCS centers, facilitate the sharing of clinical-scientific data, and coordinate the development of protocols and algorithms for translational purposes.The network supports scientific and technological research in the prevention, diagnosis, treatment, and rehabilitation of neurodegenerative disorders, including neurological, neuropsychiatric, and related conditions.Given the wide range of neurological phenotypes, effective research requires segmentation into specific thematic areas.To this end, RIN has initiated Virtual Institutes of Pathology (VIP), each dedicated to particular diseases or disease groups (such as dementias, movement disorders, immunological disorders, motor neuron diseases, epilepsies, cerebrovascular disorders, neuro-oncology, and rare neurological disorders).RIN's structure comprises these VIPs, involving diagnostic and research centers active in patient management and cross-disciplinary technological platforms (including neuroimaging, genomics telerehabilitation, and bioinformatics) for centralized and standardized analyses (as illustrated in Figure 1).This study presents the outcomes of this national endeavor to harmonize and standardize the typing of C9orf72 expansions.Additionally, we detail the distribution of hexanucleotide G 4 C 2 expansion alleles in a healthy Italian population, identifying alleles associated with a healthy phenotype and thereby aiding in the clinical interpretation of results.

Sample size evaluation
We conducted a one-sample proportions test with a continuity correction to estimate the 95% confidence interval (CI) for the population proportion of carriers of intermediate hexanucleotide repeat expansion (CHRE) (19)(20)(21).This CI was derived using the point estimate from samples with more than 24 and fewer than 30 repeats in the non-neurodegenerative European population.We extracted frequency data for each specified CHRE repeat range.It is important to note that calculating the standard error and margin of error is not feasible with 0 successes; hence, data for more than 30 CHRE ranges could not be included in this analysis.The inputs for the one-sample proportion tests were 607 carriers for the 2-23 CHRE range and 6 carriers for the 14-30 CHRE range (22-24).
Our findings indicate that, at a 95% confidence level, the actual frequency of carriers of intermediate HRE is likely between 0.003 and 0.02, with a margin of error (ME) of 0.01.Consequently, to achieve a 0.05 ME for surveying the distribution of 24-30 HRE carriers, we calculated the required sample size using Equation (1).
Equation ( 1)-Sample size estimation based on a proportion estimate at a 95% confidence level (25).
The required sample size for subjects to obtain a 0.05 ME is 1,430.Given the enrolled sample size of 967 subjects, we calculated the margin of error using the data from Equations ( 1) and (2) in Equation ( 3), obtaining a 0.063 ME.

Subjects enrollment
Nine hundred and sixty-seven elderly subjects (subjects over 70 years of age; median age 78, dev.st.6.18; male-to-female ratio 1:1.34) were selected during the clinical neurologist's routine activities.The inclusion criteria were being older than 70 years and having no family history of both ALS and FTD.Exclusion criteria encompassed the presence of any neurodegenerative disorders or C9orf72-associated phenotypes.Participants are not related to each other.Written informed consent was obtained from all participants.The study received approval from the Ethics Committee of all participating centers.

Proficiency test
To ensure the validity of the results, all participating institutes were required to perform proficiency testing (PT) before processing the samples.PT is crucial for quality assessment, especially in multicenter studies, as it benchmarks the performance of the participants.Each center involved in the study received five blind samples with known genotypes.The proficiency test was considered passed if a center accurately reported the sizes of the hexanucleotide G 4 C 2 alleles.Upon successful completion, the center was granted access to the analytical phase of the study.If inconsistencies were observed, the center underwent specific training on the use of testing kits and interpretation software.Following this training, the center was required to repeat the PT using a different set of samples.Notably, among the blind samples was DNA with an indel variation in the hexanucleotide G 4 C 2 allele (interruption in the G 4 C 2 repeat).

Repeatability test
To confirm the stability of the AmplideXPCR/CE C9orf72 Kit (Asuragen, Inc.) over time, each participating institution conducted a repeatability study using a subset of samples (n = 5).Every center involved in this study was supplied with five blind samples, each having a known genotype.These samples were tested at three different time points: T0 (initial testing), T1 (after 2 weeks), and T3 (after 4 weeks).

Genetic analysis
Genomic DNA was purified from peripheral blood samples using automated procedures (according to instrumental equipment available in IRCCS laboratories).The quality of the extracted DNA was checked by spectrophotometer analysis.Minimal quality parameters requested were: OD 260/280 and OD 260/230 ≥ 1.7; DNA [10-40 ng/μL].
Genomic DNA was typed using the AmplideXPCR/CE C9orf72 Kit (Asuragen, Inc.).The assay consists of amplification using a threeprimer G 4 C 2 -repeat primed (RP)-PCR configuration (21), followed by fragment sizing on a capillary electrophoresis (CE) instrument.PCR product (2 mL) mixed with formamide and ROX 1000 Size Ladder (Asuragen, Inc., Austin, TX) was run on a CE instrument (according to instrumental equipment available in IRCCS laboratories).The amplicons were detected according to the manufacturer's protocols (26).The sizing of amplicons was performed by GeneMapper software according to the manufacturer's protocols.Positive controls (C9orf72

Results
To standardize and validate analytical methods, each participating institution was provided with the same lot of the commercially available AmplideXPCR/CE C9orf72 Kit (Asuragen, Inc.).Additionally, they were required to conduct a proficiency test designed and distributed by Asuragen.As a result, 11 out of 13 centers successfully genotyped the blinded samples.Two centers, however, failed to correctly size a genotype that was homozygous for an expanded hexanucleotide G 4 C 2 allele (>145 repeats).Consequently, these centers underwent specific training to improve their use of the AmplideXPCR/CE C9orf72 Kit and to refine their allele sizing analyses through capillary electrophoresis.Following this intervention, the two centers were able to successfully retake and pass the proficiency test.
Once each center passed the proficiency test, a repeatability test was initiated, and all blind samples were typed at different times (T0, T1, and T3).The results demonstrated remarkable repeatability, with no variation observed in the serial measurements (repeatability coefficient = 0), thereby affirming the assay's reliability in characterizing C9orf72 alleles.Furthermore, one of the blind samples exhibited an indel variation in the hexanucleotide G 4 C 2 allele (an interruption in the G 4 C 2 repeat).All centers successfully genotyped this sample without any discrepancies in repeat sizing attributable to the indel variation, thereby providing the efficacy of the AmplideXPCR/CE C9orf72 Kit in accurately sizing alleles, even in the presence of interruptions in the G 4 C 2 repeat.
A total of 967 subjects were enrolled across 13 centers (as detailed in Supplementary Table S1-subjects per center).Alleles and genotype frequency were characterized in this cohort of healthy elderly Italian subjects.All subjects carried hexanucleotide G 4 C 2 alleles with fewer than 25 repeats.There was no statistically significant variation in the distribution of repeat lengths between healthy samples analyzed by different centers (alpha = 0.05; KW χ 2 : 9.99, p = 0.616).The distribution of alleles is presented in Table 1 and Figure 2.
All samples were characterized by alleles with fewer than 25 repeats.Specifically, 93.7% of samples had ≤10 repeats, 99.9% had ≤20 repeats, and 100% had ≤25 repeats (data shown in Table 1).The distribution of repeat lengths peaked at 2, 5, and 8 repeats, with frequencies of 48.7, 13.5, and 14.9%, respectively.The longest allele identified had 25 repeats, observed once, representing a frequency of 0.05%.Alleles with lengths ≥20 repeats had a frequency of 0.31%, which is slightly lower than the frequencies reported in other European populations (23,27,28) (Table 2).

Discussion and conclusion
Since its first detection in 2011, clinical and research evidence has repeatedly suggested the introduction of the genetic test for C9orf72 into clinical practice.In the present year, these suggestions have been definitively entered into ALS guidelines (29).Unfortunately, the extreme variability of the repeat in C9orf72 makes it hard to standardize the analysis.First, the repeat expansion of C9orf72 may vary in different tissues (30).Moreover, allelic frequency strongly varies among populations (3).It became evident that to promote the introduction of the genetic test for C9orf72 into clinical practice, it was mandatory to standardize technologies and interpret results.
To standardize the genetic test for C9orf72 and result interpretation, the RIN launched a national survey among laboratories performing G 4 C 2 repeat evaluation for diagnostic purposes.As already observed in other countries (4,17), the survey results highlighted interlaboratory heterogeneity for the cutoff between normal and pathological alleles and analytical methods.While the interpretation of clearly pathological and normal sizes has been recognized (respectively over 60 G 4 C 2 repeats and under 8 G 4 C 2 repeats), the exact definition of the cutoff for intermediate sizes remains challenging, mainly due to the absence of genetic data from the general and healthy population.In this scenario, the same intermediate alleles can be interpreted as normal or pathological in different laboratories, giving a dissimilar disease evaluation and risk prediction in genetic counseling.Given the known variability in allele distribution across different populations, the high frequency of C9orf72 HRE alleles (23), and the interlaboratory survey results, the RIN performed a multicenter study to estimate the distribution of G 4 C 2 expansion in healthy Italian elderlies (over the age of 70) and to promote the methods' standardization across the institutes involved in the RIN network.Considering the quite complete penetrance of the C9orf72associated phenotypes at 80 years of age (37), selecting a sample of neurologically healthy individuals over 70 years old seemed reasonable.Notably, factors such as family history, clinical presentation, and gender can slightly alter the median onset age, which typically ranges from 57 to 60 years (37).In the current study, a total of 967 elderly healthy subjects from the Italian population were typed to characterize the distribution of C9orf72 alleles, defining the frequency of G 4 C 2 repeat alleles in the Italian healthy population.All the analyzed samples had alleles with fewer than 25 repeats.Specifically, 93.7% of the samples had ≤10 repeats, 99.9% had ≤20 repeats, and all had ≤25 repeats (Table 1).
When comparing these findings with allelic distributions in other populations, similar peaks at 2, 5, and 8 repeats were noted (Table 1).Additionally, the frequency of alleles with ≥20 repeats in our cohort is coherent with the observed north-to-south gradient of allelic distribution in Europe.Specifically, Caucasian control populations exhibit frequencies ranging from 0.38 to 0.52 for alleles with ≥20 repeats (27), with an exceptionally high frequency of 0.89 in the Finnish population.The frequency of 0.31 observed in our Italian cohort aligns with the higher incidence of hexanucleotide repeat expansion (HRE) observed in the northern regions of Europe, as referenced in the study (28).This consistency not only reinforces the geographic variation in HRE incidence but also underscores the critical importance of stringent inclusion criteria for control subjects in genetic epidemiological studies, particularly those involving age-related diseases.
This research establishes the range of C9orf72 alleles typically found in a healthy Italian population, specifically identifying alleles with up to 25 repeats as being associated with a normal phenotype.These findings, combined with observed alleles in patients, help delineate the thresholds for normal, intermediate, and pathological alleles within this population.This study supported a definition of normal allele ranges.The benefits of the study will be evident when the data are compared with C9orf72 allele distribution in patients.The main limitation of the study is that it is not a case-control study, so Furthermore, the study underscores the benefits of collaboration among institutes, particularly in the context of sharing genomic data to harmonize analytical methods and advance applied research.Initially, half of the Neuroscience Institutes were using in-house methods for sizing the hexanucleotide G 4 C 2 expansion.By the study's conclusion, all participating institutes had adopted a uniform, commercially available kit, facilitating a standardized national reference for interpreting the normal allele thresholds.This approach, exemplified by the RIN network's model, is also a concept study to promote the achievements expected from the entry into force of the in vitro Diagnostic Regulation (EU) 2017/746 (IVDR) at the European level.Furthermore, this achievement can largely contribute to the European Network for Rare Diseases (ERN) offering a homogenous assay to test C9orf72 in the European Union; the ENCALS (European Network to Cure ALS) can equally consider the positive result of homogenization in technology obtained with the study as a referral for further initiatives aiming for broad consensus on analyzing G 4 C 2 repeat expansion in C9orf72.

3 )
a = enrolled sample size, b = total sampled subjects from literature, and k = number of HRE carriers in the range of interest (24-30) from b.Therefore, y is n proportional to a. Expected proportion obtained from Equation (2) was used to compute the achieved ME with 967 subjects sampled.

FIGURE 1
FIGURE 1Organizational model of RIN.

TABLE 1
Allelic distribution of C9orf72 alleles in our cohort and in other cohorts.Project MinE and Alzheimer's Disease Neuroimaging Initiative (23).10.3389/fneur.2024.1284459Frontiers in Neurology 07 frontiersin.orgwe still cannot define intermediate and pathological thresholds.Nevertheless, the sample selection (elderly without any C9orf72related phenotype) supports the exact definition of normal alleles, even in this extreme variable phenotypic presentation.