- 1Department of Neurology and Institute of Neurology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- 2Department of Surgery, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- 3Department of Radiology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
Introduction: Previous studies have illuminated a significant genetic component in motor neuron disease (MND) pathogenesis, with several causative genes identified. However, a substantial proportion of MND cases remain genetically unexplained, particularly regarding the comprehensive contribution of rare, high-impact variants across the exome.
Methods: Leveraging whole-exome sequencing data from nearly half a million UK Biobank participants, we systematically investigated the association between high-confidence protein-truncating variants (HC PTVs) and MND risk in a Caucasian subset. Our large-scale gene-based association analysis utilized REGENIE software and LOFTEE-defined HC PTVs.
Results: We identified significant preliminary associations between HC PTVs in 14 genes and an increased risk of MND. Notably, while NEK1 has been previously implicated in ALS, the remaining 13 genes (BLVRB, KLHL32, RIMS2, DYDC2, DCBLD1, ANXA4, COMP, TRIM42, ANO4, NFX1, CFAP206, CKAP2L, and ANGPTL4) show preliminary associations as novel candidate loci for the disease. Functional enrichment analyses further indicated that these genes are significantly involved in critical biological pathways, including collagen-containing extracellular matrix organization and ciliary function. Furthermore, tissue specificity analysis highlighted a strong enrichment of these genes’ expression in brain regions, with the hypothalamus showing the highest specificity.
Discussion: These findings suggest a potential expansion of the known genetic landscape of MND, and highlight novel biological pathways implicated in its pathogenesis. This study underscores the power of large-scale population genetics in uncovering critical disease mechanisms and offers new avenues for mechanistic research and therapeutic development for MND, pending independent validation.
Introduction
Motor neuron disease (MND) represents a group of devastating, progressive neurodegenerative disorders characterized by the selective degeneration of upper and/or lower motor neurons (Foster and Salajegheh, 2019; Malkki, 2016; Babazadeh et al., 2023). This relentless progression ultimately leads to muscle weakness, atrophy, paralysis, and typically respiratory failure and death (Winhammar et al., 2005; Rocha et al., 2005). The most common and rapidly progressive form is amyotrophic lateral sclerosis (ALS), but the clinical spectrum also encompasses progressive muscular atrophy (PMA) and primary lateral sclerosis (PLS), underscoring the disease’s significant clinical and pathological heterogeneity (Feldman et al., 2022; Angilletta et al., 2023; Turner et al., 2020). Globally, MND imposes an immense burden on patients, their families, and healthcare systems, with a challenging prognosis and a critical lack of effective disease-modifying treatments currently available (Rizea et al., 2024; Niedermeyer et al., 2019; Owens, 2017).
Genetics plays an increasingly recognized and pivotal role in MND pathogenesis. While approximately 5 to 7% of MND cases are familial, a substantial genetic contribution is also evident in sporadic forms, highlighting a complex interplay of genetic predisposition and environmental factors (Foster and Salajegheh, 2019; de Boer et al., 2024; Benatar et al., 2009). The identification of key causative genes such as C9orf72 (Mizielinska et al., 2025), SOD1 (Berdyński et al., 2022), TARDBP (Balendra et al., 2025), and FUS (Mackenzie et al., 2010) has advanced our understanding of MND’s intricate pathological mechanisms, implicating diverse pathways including RNA processing, protein aggregation, and nucleocytoplasmic transport (Beers and Appel, 2019; Tolochko et al., 2025). Despite these significant advancements, a large proportion of MND cases still lack a defined genetic etiology (Malkki, 2016; Chaudhary et al., 2022). Previous genetic studies, while foundational, may have faced limitations due to insufficient sample sizes, inadequate power to detect rare variants, or less comprehensive analytical methodologies, leaving many genetic underpinnings of MND yet to be uncovered.
Whole exome sequencing (WES) has emerged as a powerful and cost-effective tool in genetic research, offering distinct advantages over traditional genome-wide association studies (GWAS) for detecting rare coding region variants. WES’s ability to identify high-impact, low-frequency variants makes it particularly suited for exploring novel disease-associated genes and genetic modifiers that might be missed by common variant approaches. This technology allows for a targeted yet comprehensive examination of the protein-coding regions of the genome, where a significant proportion of Mendelian disease-causing mutations reside.
The UK Biobank is an unparalleled resource for genetic association studies, comprising a prospective cohort of 500,000 participants with extensive phenotypic data, including detailed health records, and high-quality WES data (Allen et al., 2024; Backman et al., 2021; Van Hout et al., 2020). This large-scale dataset significantly boosts the statistical power to uncover novel genetic associations and minimize false positives by enabling robust statistical analyses that account for confounding factors. Crucially, the inclusion of over 800 individuals diagnosed with MND within this well-characterized cohort provides a unique and invaluable opportunity to delve deeper into the genetic landscape of the disease.
Building upon these strengths, our study aimed to leverage the UK Biobank’s WES data to explore potential new genes associated with an increased risk of MND (Backman et al., 2021; Van Hout et al., 2020). Utilizing rigorous gene-based association testing with REGENIE (Mbatchou et al., 2021) software and focusing on high-confidence protein-truncating variants (HC PTVs) as defined by LOFTEE (Karczewski et al., 2020), we systematically screened for genetic predispositions within the Caucasian subset of this cohort. Here, we report the preliminary identification of 18 genes whose HC PTVs show significant association with an elevated risk of MND. Notably, while NEK1 is a previously implicated locus, the remaining 17 genes represent new candidate associations that warrant independent validation. Functional and tissue specificity analyses of these candidate genes suggest the involvement of biological pathways, including the collagen-containing extracellular matrix and cilium, and highlight strong expression patterns in brain regions, particularly the hypothalamus. These preliminary findings contribute to the genetic understanding of MND and provide new avenues for mechanistic research.
Methods
Ethics
The UK Biobank is a large-scale, prospective cohort study that comprehensively collects phenotypic and genetic data from approximately 500,000 participants, who were aged 38 to 72 years at the time of their enrollment (Bycroft et al., 2018). Ethical approval for the study was granted by the North West Multi-Centre Research Ethics Committee,1 and all participants provided written informed consent. This overarching ethical clearance allows researchers to utilize UK Biobank data without seeking separate approval for individual studies. Our specific investigation was conducted under UK Biobank application number 162635.
UK biobank data processing and quality control
We defined our MND cohort using algorithmically-derived outcomes provided directly by UK Biobank (Data-Field 42,028 and Data-Field 42,029). The dataset included a range of diagnostic categories: 51 self-reported cases, 372 cases with a primary hospital diagnosis, 119 cases with a primary diagnosis on death certificates, 273 cases with a secondary hospital diagnosis, and 10 cases where MND was a contributory cause of death.
For our analysis, we leveraged whole-exome sequencing (WES) data from 469,589 individuals from the UK Biobank (Backman et al., 2021). We initially excluded participants based on standard quality control (QC) criteria, including those exhibiting excess heterozygosity, individuals with ≥5% autosomal variant missingness on genotyping arrays, or those not categorized as part of the phased samples subset (Bycroft et al., 2018). WES data, aligned to GRCh38, were provided as population- or individual-level files and accessed via the UK Biobank Research Analysis Platform (RAP). In addition to the existing QC measures applied by the data providers (Backman et al., 2021), we implemented several supplementary QC procedures. We used ‘bcftools v1.14 norm’ (Danecek et al., 2021) to split multi-allelic sites and to perform left-correction and normalization of indels. We then filtered out variants that did not meet our stringent criteria: (1) a read depth of less than 7; (2) a genotype quality score below 20; and (3) a binomial test p-value for alternative allele reads versus reference allele reads of ≤0.001 for heterozygous genotypes. For indel genotypes, we applied a more stringent criterion, retaining only variants with a read depth of at least 10 and a genotype quality of at least 20. Variants that failed to meet these thresholds were designated as missing. Subsequently, any variant with more than 50% missing genotypes was excluded from all downstream analyses (Gardner et al., 2022).
The remaining variants were annotated using Ensembl Variant Effect Predictor (VEP v104) (McLaren et al., 2016), with the ‘-everything’ flag enabled and additional plugins for REVEL (Ioannidis et al., 2016), CADD (Rentzsch et al., 2019), and LOFTEE (Karczewski et al., 2020). AlphaMissense scores were integrated by downloading data from the corresponding GitHub repository (Cheng et al., 2023). A single Ensembl transcript was prioritized for each variant based on a hierarchical scheme: protein-coding transcripts were preferred, followed by MANE select v0.97 transcripts (Morales et al., 2022), and finally, the VEP canonical transcript. Variant consequences were ranked by severity as defined by VEP. Stop-gained, splice-site disrupting, and frameshift variants were aggregated into a single protein-truncating variant (PTV) category after being filtered by LOFTEE to minimize false positives (Karczewski et al., 2020). Annotations for missense and synonymous variants were adopted directly from VEP. Our primary analysis cohort consisted of individuals of ‘white European’ ancestry, excluding those who self-identified as belonging to other ancestries via a questionnaire, as well as participants who had withdrawn consent from the study. This stringent selection process resulted in a final cohort of 393,746 individuals.
To prepare for sensitivity analyses, we identified individuals with known MND- or ALS-associated pathogenic variants. We obtained a list of 627 reported pathogenic/likely pathogenic MND/ALS variants from the ClinVar database2 and found that these were carried by 1,634 individuals in our dataset, eight of whom were MND cases. We also calculated the repeat size of the C9ORF72 GGGGCC expansion using ExpansionHunter (Dolzhenko et al., 2019) on individual-level whole-genome sequencing (WGS) data. The vast majority (>95%) of neurologically healthy individuals have ≤11 hexanucleotide repeats in the C9ORF72 gene (Rutherford et al., 2012). As the pathological repeat-length threshold has not been definitively established, we used an arbitrary cutoff of 30 repeats, which is common practice in most studies (Balendra and Isaacs, 2018). This led to the exclusion of 849 individuals exceeding this threshold, 67 of whom were MND cases. Six individuals were found to harbor both ClinVar-reported pathogenic/likely pathogenic mutations and over 30 C9ORF72 GGGGCC repeats. To account for potential relatedness, we identified all possible first-degree relatives using a correlation coefficient exceeding 0.49 from the Genetic Relationship Matrix (GRM) for all samples with WES data.
Gene-burden testing in UK biobank
REGENIE v4.1 (Mbatchou et al., 2021) served as our primary analytical tool for conducting the gene-burden test. To initiate our analysis, we first constructed a null model by querying a set of genotypes with a minor allele count (MAC) greater than 100, derived from genotyping arrays for individuals with WES data. To enable gene-level testing, we collapsed variants into unified ‘mask’ genotypes for association analysis, aligning with REGENIE’s documentation.
We defined high-confidence (HC) PTVs as stop-gained, splice-site disrupting, and frameshift variants that had been filtered by LOFTEE to minimize false positives. We then generated masks for these HC PTVs with a minor allele frequency (MAF) less than 0.1%, as determined by LOFTEE. Additionally, masks were created for missense variants using various established pathogenicity prediction thresholds: CADD (>20), REVEL (>0.7 and >0.5), and AlphaMissense (>0.9, >0.7, and >0.56). We subsequently used REGENIE to analyze phenotypes using its default parameters. Our models included age, sex, and the first 10 principal components (PCs) as calculated by Bycroft et al. (2018) as covariates. We also performed SKAT, SKAT-O, and SKAT-ACAT analyses using REGENIE (Zhao et al., 2024; Wu et al., 2011; Lee et al., 2012), applying the identical variant filtering and covariate inclusion as previously described. For all gene-burden tests, a Bonferroni-adjusted p-value threshold was set at 2.5 × 10−6 (derived from 0.05/20,000). Odds ratios (ORs) were calculated using logistic regression.
Results
We included a total of 663 MND cases (age 60.83 ± 6.57 years, 44.19% Female) and 393,083 controls (age 56.91 ± 8.00 years, 54.04% Female) of European ancestry with available WES data in our primary analysis. Our gene-based association analysis revealed that protein-truncating variants (PTVs) in 18 genes (BLVRB, KLHL32, NEK1, RIMS2, DYDC2, DCBLD1, ANXA4, SLC44A3, ATP10A, FRAS1, COMP, TRIM42, ANO4, NFX1, CFAP206, NLRP2, CKAP2L and ANGPTL4) were significantly enriched in MND cases compared to controls (PSKAT-O < 2.5 × 10−6, Table 1).
Table 1. Gene-based burden analysis of high-confidence protein-truncating variants in candidate motor neuron disease risk genes.
Among these, PTVs in BLVRB exhibited the most significant association with MND, showing the strongest statistical evidence (OR = 18.64, 95% CI = 7.72–45.02, PGLM = 7.85 × 10−11, PSKAT-O = 1.41 × 10−15). As a well-established ALS gene (Mann et al., 2023; Noh et al., 2025; Rifai et al., 2025), PTVs in NEK1 showed the highest carrier frequency in MND cases, detected in 9 out of 663 MND patients (1.36%) compared to 613 out of 393,083 controls (0.16%, OR = 9.23, 95% CI = 5.29–16.10, PGLM = 4.77 × 10−15, PSKAT-O = 6.75 × 10−15). FRAS1 and NLRP2 PTVs ranked second in carrier frequency among MND cases, with FRAS1 PTVs observed in 7 out of 663 patients (1.06%) compared to 643 out of 393,083 controls (0.16%, OR = 6.94, 95% CI = 3.52–13.69, PGLM = 2.30 × 10−8, PSKAT-O = 2.44 × 10−8). NLRP2 PTVs were found in 7 MND cases and 972 control individuals (0.25%, OR = 4.58, 95% CI = 2.29–9.19, PGLM = 1.79 × 10−5, PSKAT-O = 1.28 × 10−6). PTVs in DCBLD1, SLC44A3, TRIM42, and CFAP206 were identified in 4 out of 663 MND cases (0.60%). PTVs in the remaining 10 genes (KLHL32, RIMS2, DYDC2, ANXA4, ATP10A, COMP, ANO4, NFX1, CKAP2L, and ANGPTL4) were identified in 3 out of 663 MND cases (0.45%) each, with ORs for PTVs in these genes ranging from 4.08 to 30.80. Genes with fewer than three PTV carriers were excluded from the analysis to ensure statistical robustness.
In contrast to PTVs, no significant association was observed between missense variants of the other 17 candidate genes and MND, irrespective of pathogenicity prediction thresholds (AlphaMissense > 0.9, > 0.7 or > 0.6; REVEL > 0.7 or > 0.5; CADD > 20) or sensitivity analyses (Supplementary Table S1). This suggests that the primary pathogenicity in these 17 candidate genes is likely attributed to PTVs rather than missense variants. However, for NEK1, missense variants of different masks (AlphaMissense > 0.9, > 0.7 or > 0.6; REVEL > 0.7 or > 0.5) were significantly enriched in MND cases compared to control individuals (PSKAT-O < 2.5 × 10−6, Supplementary Table S1), highlighting the high pathogenicity of NEK1 missense variants and their significant contribution to MND pathogenesis.
To further strengthen the robustness of our findings, we performed sensitivity analyses by excluding individuals with known confounding genetic factors. First, 1,634 individuals harboring previously reported pathogenic or likely pathogenic mutations for MND or amyotrophic lateral sclerosis (ALS), according to the ClinVar database, were excluded. Second, given that GGGGCC repeat expansion in the C9ORF72 gene is a frequent cause of ALS (Balendra and Isaacs, 2018; DeJesus-Hernandez et al., 2011; Todd and Paulson, 2013), we calculated repeat sizes using ExpansionHunter (Dolzhenko et al., 2019) for all individuals with available WGS data. An arbitrary cutoff of 30 GGGGCC repeats, commonly used in most studies (Balendra and Isaacs, 2018), was applied, leading to the exclusion of 849 individuals exceeding this threshold. Six individuals were found to harbor both ClinVar-reported pathogenic/likely pathogenic mutations and over 30 C9ORF72 GGGGCC repeats.
After excluding a total of 2,652 individuals (those with reported ALS pathogenic/likely pathogenic mutations and/or C9ORF72 GGGGCC repeat expansion), our sensitivity analysis was performed on a cohort of 588 MND cases (age 61.16 ± 6.50 years, 43.88% Female) and 390,681 controls (age 56.91 ± 8.00 years, 54.04% Female). This analysis revealed that PTVs in 14 out of the 18 genes identified in the primary analysis remained significantly enriched in MND cases compared to controls (PSKAT-O < 2.5 × 10−6, Table 1), except for SLC44A3, ATP10A, FRAS1 and NLRP2. Furthermore, the sensitivity analysis identified six additional genes (EPHX1, SELENOV, CDT1, ACTN3, USP16, and MAJIN) whose PTVs were significantly enriched in MND cases (PSKAT-O < 2.5 × 10−6, Table 1), but not in controls. PTVs for these six genes were identified in three to four MND cases each. PTVs in NEK1 remained significantly enriched in MND cases in the sensitivity analysis, with 4 out of 588 MND patients (0.68%) carrying PTVs compared to 314 out of 390,681 controls (0.08%, OR = 9.53, 95% CI = 4.05–22.42, PGLM = 2.38 × 10−7, PSKAT-O = 4.90 × 10−8). Consistent with the primary analysis, no significant association was observed between missense variants of the 20 candidate MND genes identified in the sensitivity analysis and MND, except for NEK1 (Supplementary Table S1). For NEK1, missense variants of different masks (AlphaMissense > 0.9, > 0.7 or > 0.6; REVEL > 0.7 or > 0.5) continued to show significant enrichment in MND cases (PSKAT-O < 2.5 × 10−6, Supplementary Table S1).
Functional enrichment analysis (KEGG/GO) of the 14 candidate MND genes (from both primary and sensitivity analyses that remained significant, Figure 1A) showed no significant KEGG pathway enrichment (e.g., riboflavin metabolism adjusted p = 0.06146). Similarly, GO Biological Process (BP) and Molecular Function (MF) terms showed no significant enrichments. However, GO Cellular Component (CC) analysis revealed significant enrichments in collagen-containing extracellular matrix (GO:0062023, adjusted p = 0.01866) and cilium (GO:0005929, adjusted p = 0.01866). These findings suggest extracellular matrix remodeling and ciliary dysfunction as potential novel mechanisms in MND pathogenesis.
Figure 1. Functional enrichment and tissue-specific expression analyses of the 14 candidate MND genes identified through robust gene-burden testing. (A) Gene ontology (GO) and KEGG pathway enrichment analysis. The figure displays the significantly enriched GO cellular component (CC) terms (Bonferroni adjusted p < 0.05) for the 14 candidate MND genes (those remaining significant in both primary and sensitivity analyses: BLVRB, KLHL32, RIMS2, DYDC2, DCBLD1, ANXA4, COMP, TRIM42, ANO4, NFX1, CFAP206, CKAP2L, NEK1, and ANGPTL4). Significant enrichment was noted in the collagen-containing extracellular matrix (GO:0062023) and cilium (GO:0005929). These findings suggest the potential involvement of pathways related to structural integrity and primary ciliary signaling in MND pathogenesis. GO biological process (BP), molecular function (MF), and KEGG pathways did not show significant enrichment after correction. (B) Expression patterns across 54 tissues (GTEx v8). This heatmap illustrates the normalized expression level (TPM, transcripts per million) of the 14 candidate genes across 54 human tissue types profiled by the Genotype-Tissue Expression (GTEx) project. The color intensity reflects the relative expression level, with darker shades indicating higher expression. Genes are clustered based on their expression profiles, showing variable patterns. Notably, several genes (KLHL32, ANO4, RIMS2, DYDC2, and CFAP206) exhibit generally higher expression in various brain regions compared to non-brain tissues. (C) Tissue specificity analysis via differentially expressed genes (DEGs). This analysis evaluates the tissue specificity of the 14 candidate MND genes by assessing the enrichment of up-regulated differentially expressed genes (DEGs) in human tissues. The analysis revealed that up-regulated DEGs were predominantly enriched in brain regions. The total DEG analysis indicated that the brain hypothalamus exhibited the highest tissue specificity (Bonferroni adjusted p-value < 0.05, highlighted in red). This strong brain-specific expression pattern in the hypothalamus, a region critical for autonomic and metabolic regulation, suggests that genetic perturbations in these identified genes may have direct pathological consequences in the central nervous system.
To further understand the expression patterns of these 14 MND candidate genes, we explored their tissue expression using the Genotype-Tissue Expression Project (GTEx, https://gtexportal.org/home/) database (GTEx Consortium, 2013) (Figure 1B). We found that expression levels of COMP, ANXA4, CKAP2L, BLVRB, DCBLD1, and NFX1 were generally lower in brain regions compared to non-brain regions, while KLHL32, ANO4, RIMS2, DYDC2, and CFAP206 (C6ORF165) showed higher expression in brain regions. We then investigated the tissue specificity of these 14 candidate MND genes through differentially expressed gene (DEG) analysis in human tissues (Figure 1C). Our analysis revealed that up-regulated DEGs were predominantly enriched in brain regions. Furthermore, the total DEG analysis indicated that the brain hypothalamus, a region known for its crucial roles in energy metabolism, autonomic function, and neuroinflammation, exhibited the highest tissue specificity (significantly enriched DEG sets, Bonferroni adjusted p-value < 0.05, highlighted in red in Figure 1C). This strong brain-specific expression pattern, particularly in a functionally crucial region like the hypothalamus, points towards a potential disruption of these fundamental regulatory pathways in MND and suggests that the identified candidate MND genes are highly relevant to central nervous system (CNS) function and pathology.
Discussion
This study represents an important exploration of the genetic architecture of MND, leveraging the extensive UK Biobank cohort to identify preliminary genetic risk factors. Our primary finding suggests that PTVs in 18 genes are significantly associated with an increased risk of MND, potentially expanding the known genetic landscape of the disease. The identification of NEK1, a gene previously established as an ALS-associated locus (Yao et al., 2021; Jiang et al., 2023), serves as a crucial internal validation, affirming the robustness and sensitivity of our analytical pipeline in detecting genuine disease-associated genes. Importantly, the remaining 17 genes represent new candidate associations that warrant extensive functional and replication studies.
To further strengthen the statistical evidence for these preliminary findings and exclude known major confounding factors, we performed stringent sensitivity analyses. These analyses, which excluded individuals with known pathogenic/likely pathogenic mutations for ALS/MND or C9ORF72 GGGGCC repeat expansions, provided a highly refined set of genetic associations. Our results showed that 14 of the initial 18 genes remained significantly associated with MND risk, highlighting the statistical confidence in these specific findings. While four genes lost significance in sensitivity analysis, we remarkably identified six additional genes (EPHX1, SELENOV, CDT1, ACTN3, USP16, and MAJIN) whose PTVs were significantly enriched in MND cases. This extensive and rigorous validation process suggests a broad and heterogeneous genetic contribution to MND.
Our work on the UK Biobank is highly relevant to other population-based genetic studies. For instance, a recent study published in Brain (Gao et al., 2025) utilized the same UK Biobank cohort to estimate the high population risk of neurodegenerative disease in C9ORF72 hexanucleotide repeat expansion (HRE) carriers, while also identifying the UNC13A genotype as a key genetic modifier. Our rigorous methodology, which includes the explicit exclusion of C9ORF72 HRE individuals, aligns with the necessity of controlling for this major genetic factor, ensuring that the preliminary associations we report are independent of the most common cause of ALS/FTD and complement the findings on common variants and modifiers.
Our comprehensive functional enrichment analyses shed crucial light on the potential biological relevance of these newly identified genes. While KEGG pathway and GO-BP and GO-MF analyses did not yield significant enrichments, GO-CC analysis revealed significant enrichment in Collagen-Containing Extracellular Matrix and Cilium. The extracellular matrix (ECM) provides essential structural support (Theocharis et al., 2016), mediates cell–cell and cell-matrix interactions, and plays a crucial role in neuronal development, survival, and function (Fawcett et al., 2019). Alterations in ECM composition or integrity can impact neuronal migration, connectivity, and overall neuronal health, potentially contributing to neurodegeneration (Long and Huttner, 2019). Cilia, particularly primary cilia, are ubiquitous organelles found on most mammalian cells, including various types of neurons (Wang et al., 2024; Ma et al., 2022). They serve as crucial signaling hubs involved in diverse processes such as neuronal development, differentiation, and synaptic function (Tu et al., 2023). Growing evidence implicates ciliary dysfunction (ciliopathies) in a spectrum of neurological disorders (Khan et al., 2024). Our findings suggest that genetic variations impacting the integrity of the extracellular matrix or the proper function of cilia may predispose individuals to MND by compromising the supportive microenvironment or fundamental signaling pathways essential for motor neuron health and survival.
Furthermore, our investigation into the tissue-specific expression of the candidate genes provides compelling supporting evidence for their role in a CNS disorder. The analysis of DEGs showed a predominant enrichment of up-regulated DEGs in brain regions. Most notably, the total DEG analysis revealed that the brain hypothalamus exhibited the highest tissue specificity. This finding is particularly significant given the hypothalamus’s well-established roles in energy metabolism, autonomic function, and neuroinflammation, all of which are increasingly recognized as contributing factors to MND pathology (Chen et al., 2025; Miller and Spencer, 2014). This strong brain-specific expression pattern, particularly in a functionally crucial region like the hypothalamus, points towards a potential disruption of these fundamental regulatory pathways and provides biological plausibility for how genetic perturbations in these genes may directly contribute to the neurodegenerative processes observed in MND.
Our study benefits from the immense sample size and comprehensive WES data of the UK Biobank, providing unparalleled statistical power to detect rare genetic associations and ensuring robust control for confounding factors through advanced statistical methods like REGENIE (Mbatchou et al., 2021). The identification of a total of 20 candidate genes represents a substantial contribution to the field, suggesting an expansion of the genetic landscape of the disease. The rigorous sensitivity analyses, which excluded known genetic confounders, provide a high degree of confidence in the statistical nature of our findings. However, several limitations must be acknowledged. While the UK Biobank is vast, our findings are preliminary observations that mandate immediate and independent replication in diverse, large-scale MND cohorts to confirm these associations and assess their generalizability. Furthermore, while ALS is the most prevalent form, a critical limitation is our inability to precisely distinguish between MND subtypes (e.g., ALS, PMA, PLS) due to the nature of the algorithmically-derived outcomes (ICD codes) available in the UK Biobank. Therefore, we cannot comment definitively on the specific relationship between the identified gene PTVs and any single MND subtype. Moreover, the interpretation of rare variants regarding their precise effect size and penetrance remains challenging and requires further investigation. Crucially, the lack of detailed individual-level phenotypic data (motor and cognitive function) for our cases is a significant limitation, preventing us from exploring potential genotype–phenotype correlations or prognostic implications of the identified variants, which will be a key focus for future studies with clinical cohorts.
Building upon these compelling preliminary findings, future research should prioritize several key areas. Independent replication of these associations in diverse MND patient cohorts is paramount to establish their definitive role as genetic risk factors. Subsequently, in-depth functional studies are urgently needed to dissect the precise molecular mechanisms by which these genes contribute to motor neuron degeneration. This will involve using advanced cellular models (e.g., patient-derived iPSC motor neurons, CRISPR/Cas9-edited cells) and relevant animal models to assess their impact on critical cellular processes such as ECM integrity and ciliary function. Exploring the precise impact of specific PTVs within these genes on protein function and cellular phenotypes will also be critical. From a translational perspective, these novel genetic insights open new avenues for developing therapeutic strategies. The identified pathways suggest potential targets for pharmacological interventions that could modify the ECM or ciliary function. Ultimately, integrating these genetic findings with other multi-omics data (e.g., RNA-seq, proteomics, single-cell sequencing) will provide a more holistic understanding of their intricate roles in MND pathogenesis and pave the way for personalized medicine approaches.
Data availability statement
Publicly available datasets were analyzed in this study. This data can be found at: https://www.ukbiobank.ac.uk/.
Ethics statement
The studies involving humans were approved by North West Multi-Centre Research Ethics Committee. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.
Author contributions
ZH: Investigation, Writing – review & editing, Validation, Methodology, Funding acquisition, Formal analysis, Writing – original draft, Visualization, Data curation. J-jW: Data curation, Visualization, Formal analysis, Validation, Software, Methodology, Writing – review & editing, Writing – original draft, Investigation. Q-qY: Writing – original draft, Writing – review & editing, Investigation, Formal analysis, Methodology, Validation, Visualization, Data curation. YF: Formal analysis, Project administration, Writing – original draft, Visualization, Data curation, Resources, Validation, Methodology, Software, Investigation, Supervision, Funding acquisition, Conceptualization, Writing – review & editing. JL: Project administration, Data curation, Validation, Formal analysis, Visualization, Methodology, Resources, Conceptualization, Writing – review & editing, Funding acquisition, Writing – original draft, Supervision, Investigation, Software.
Funding
The author(s) declared that financial support was received for this work and/or its publication. This study was supported by the National Natural Science Foundation of China (grant 82401649 awarded to YF; grants 82230040, 82071415, 81873778 awarded to JL) and the China Postdoctoral Science Foundation (grant 2024M752018 to YF). Additional support was provided by the Huangpu District Health and Medical Research Project, Shanghai (grant HLM202205).
Acknowledgments
We express our sincere gratitude to the participants of the UK Biobank and recognize the invaluable contributions of the UK Biobank team in the collection, curation, and management of data that enabled this research.
Conflict of interest
The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declared that Generative AI was not used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnagi.2025.1735522/full#supplementary-material
Footnotes
References
Allen, N. E., Lacey, B., Lawlor, D. A., Pell, J. P., Gallacher, J., Smeeth, L., et al. (2024). Prospective study design and data analysis in UK biobank. Sci. Transl. Med. 16:eadf4428. doi: 10.1126/scitranslmed.adf4428,
Angilletta, I., Ferrante, R., Giansante, R., Lombardi, L., Babore, A., Dell’Elice, A., et al. (2023). Spinal muscular atrophy: an evolving scenario through new perspectives in diagnosis and advances in therapies. Int. J. Mol. Sci. 24:14873. doi: 10.3390/ijms241914873,
Babazadeh, A., Rayner, S. L., Lee, A., and Chung, R. S. (2023). TDP-43 as a therapeutic target in neurodegenerative diseases: focusing on motor neuron disease and frontotemporal dementia. Ageing Res. Rev. 92:102085. doi: 10.1016/j.arr.2023.102085,
Backman, J. D., Li, A. H., Marcketta, A., Sun, D., Mbatchou, J., Kessler, M. D., et al. (2021). Exome sequencing and analysis of 454,787 UK biobank participants. Nature 599, 628–634. doi: 10.1038/s41586-021-04103-z,
Balendra, R., and Isaacs, A. M. (2018). C9orf72-mediated ALS and FTD: multiple pathways to disease. Nat. Rev. Neurol. 14, 544–558. doi: 10.1038/s41582-018-0047-2,
Balendra, R., Sreedharan, J., Hallegger, M., Luisier, R., Lashuel, H. A., Gregory, J. M., et al. (2025). Amyotrophic lateral sclerosis caused by TARDBP mutations: from genetics to TDP-43 proteinopathy. Lancet Neurol. 24, 456–470. doi: 10.1016/s1474-4422(25)00109-7,
Beers, D. R., and Appel, S. H. (2019). Immune dysregulation in amyotrophic lateral sclerosis: mechanisms and emerging therapies. Lancet Neurol. 18, 211–220. doi: 10.1016/s1474-4422(18)30394-6,
Benatar, M., Kurent, J., and Moore, D. H. (2009). Treatment for familial amyotrophic lateral sclerosis/motor neuron disease. Cochrane Database Syst. Rev. 2009:Cd006153. doi: 10.1002/14651858.CD006153.pub2
Berdyński, M., Miszta, P., Safranow, K., Andersen, P. M., Morita, M., Filipek, S., et al. (2022). SOD1 mutations associated with amyotrophic lateral sclerosis analysis of variant severity. Sci. Rep. 12:103. doi: 10.1038/s41598-021-03891-8,
Bycroft, C., Freeman, C., Petkova, D., Band, G., Elliott, L. T., Sharp, K., et al. (2018). The UK biobank resource with deep phenotyping and genomic data. Nature 562, 203–209. doi: 10.1038/s41586-018-0579-z,
Chaudhary, R., Agarwal, V., Rehman, M., Kaushik, A. S., and Mishra, V. (2022). Genetic architecture of motor neuron diseases. J. Neurol. Sci. 434:120099. doi: 10.1016/j.jns.2021.120099,
Chen, J., Cai, M., and Zhan, C. (2025). Neuronal regulation of feeding and energy metabolism: a focus on the hypothalamus and brainstem. Neurosci. Bull. 41, 665–675. doi: 10.1007/s12264-024-01335-7,
Cheng, J., Novati, G., Pan, J., Bycroft, C., Žemgulytė, A., Applebaum, T., et al. (2023). Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381:eadg7492. doi: 10.1126/science.adg7492,
Danecek, P., Bonfield, J. K., Liddle, J., Marshall, J., Ohan, V., Pollard, M. O., et al. (2021). Twelve years of SAMtools and BCFtools. Gigascience 10:giab008. doi: 10.1093/gigascience/giab008,
de Boer, E. M. J., Demaegd, K. C., de Bie, C. I., Veldink, J. H., van den Berg, L. H., and van Es, M. A. (2024). Familial motor neuron disease: co-occurrence of PLS and ALS (-FTD). Amyotroph. Lateral Scler. Frontotemporal Degener. 25, 53–60. doi: 10.1080/21678421.2023.2255621,
DeJesus-Hernandez, M., Mackenzie, I. R., Boeve, B. F., Boxer, A. L., Baker, M., Rutherford, N. J., et al. (2011). Expanded GGGGCC hexanucleotide repeat in noncoding region of C9ORF72 causes chromosome 9p-linked FTD and ALS. Neuron 72, 245–256. doi: 10.1016/j.neuron.2011.09.011,
Dolzhenko, E., Deshpande, V., Schlesinger, F., Krusche, P., Petrovski, R., Chen, S., et al. (2019). ExpansionHunter: a sequence-graph-based tool to analyze variation in short tandem repeat regions. Bioinf (Oxf) 35, 4754–4756. doi: 10.1093/bioinformatics/btz431,
Fawcett, J. W., Oohashi, T., and Pizzorusso, T. (2019). The roles of perineuronal nets and the perinodal extracellular matrix in neuronal function. Nat. Rev. Neurosci. 20, 451–465. doi: 10.1038/s41583-019-0196-3,
Feldman, E. L., Goutman, S. A., Petri, S., Mazzini, L., Savelieff, M. G., Shaw, P. J., et al. (2022). Amyotrophic lateral sclerosis. Lancet (London, England) 400, 1363–1380. doi: 10.1016/s0140-6736(22)01272-7,
Foster, L. A., and Salajegheh, M. K. (2019). Motor neuron disease: pathophysiology, diagnosis, and management. Am. J. Med. 132, 32–37. doi: 10.1016/j.amjmed.2018.07.012,
Gao, J., Douglas, A. G. L., Chalitsios, C. V., Scaber, J., Talbot, K., Turner, M. R., et al. (2025). Neurodegenerative disease in C9orf72 repeat expansion carriers: population risk and effect of UNC13A. Brain 148, 3865–3871. doi: 10.1093/brain/awaf269,
Gardner, E. J., Kentistou, K. A., Stankovic, S., Lockhart, S., Wheeler, E., Day, F. R., et al. (2022). Damaging missense variants in IGF1R implicate a role for IGF-1 resistance in the etiology of type 2 diabetes. Cell Genom. 2. doi: 10.1016/j.xgen.2022.100208,
GTEx Consortium (2013). The genotype-tissue expression (GTEx) project. Nat. Genet. 45, 580–585. doi: 10.1038/ng.2653,
Ioannidis, N. M., Rothstein, J. H., Pejaver, V., Middha, S., McDonnell, S. K., Baheti, S., et al. (2016). REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. Am. J. Hum. Genet. 99, 877–885. doi: 10.1016/j.ajhg.2016.08.016,
Jiang, Q., Lin, J., Wei, Q., Li, C., Hou, Y., Zhang, L., et al. (2023). Genetic and clinical characteristics of ALS patients with NEK1 gene variants. Neurobiol. Aging 123, 191–199. doi: 10.1016/j.neurobiolaging.2022.11.001,
Karczewski, K. J., Francioli, L. C., Tiao, G., Cummings, B. B., Alföldi, J., Wang, Q., et al. (2020). The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443. doi: 10.1038/s41586-020-2308-7,
Khan, S. S., Jaimon, E., Lin, Y. E., Nikoloff, J., Tonelli, F., Alessi, D. R., et al. (2024). Loss of primary cilia and dopaminergic neuroprotection in pathogenic LRRK2-driven and idiopathic Parkinson's disease. Proc. Natl. Acad. Sci. USA 121:e2402206121. doi: 10.1073/pnas.2402206121,
Lee, S., Wu, M. C., and Lin, X. (2012). Optimal tests for rare variant effects in sequencing association studies. Biostatistics 13, 762–775. doi: 10.1093/biostatistics/kxs014,
Long, K. R., and Huttner, W. B. (2019). How the extracellular matrix shapes neural development. Open Biol. 9:180216. doi: 10.1098/rsob.180216,
Ma, R., Kutchy, N. A., Chen, L., Meigs, D. D., and Hu, G. (2022). Primary cilia and ciliary signaling pathways in aging and age-related brain disorders. Neurobiol. Dis. 163:105607. doi: 10.1016/j.nbd.2021.105607,
Mackenzie, I. R., Rademakers, R., and Neumann, M. (2010). TDP-43 and FUS in amyotrophic lateral sclerosis and frontotemporal dementia. Lancet Neurol. 9, 995–1007. doi: 10.1016/s1474-4422(10)70195-2,
Malkki, H. (2016). Motor neuron disease: new insights into genetic risk factors for amyotrophic lateral sclerosis. Nat. Rev. Neurol. 12:491. doi: 10.1038/nrneurol.2016.117,
Mann, J. R., McKenna, E. D., Mawrie, D., Papakis, V., Alessandrini, F., Anderson, E. N., et al. (2023). Loss of function of the ALS-associated NEK1 kinase disrupts microtubule homeostasis and nuclear import. Sci. Adv. 9:eadi5548. doi: 10.1126/sciadv.adi5548,
Mbatchou, J., Barnard, L., Backman, J., Marcketta, A., Kosmicki, J. A., Ziyatdinov, A., et al. (2021). Computationally efficient whole-genome regression for quantitative and binary traits. Nat. Genet. 53, 1097–1103. doi: 10.1038/s41588-021-00870-7,
McLaren, W., Gil, L., Hunt, S. E., Riat, H. S., Ritchie, G. R. S., Thormann, A., et al. (2016). The Ensembl variant effect predictor. Genome Biol. 17:122. doi: 10.1186/s13059-016-0974-4,
Miller, A. A., and Spencer, S. J. (2014). Obesity and neuroinflammation: a pathway to cognitive impairment. Brain Behav. Immun. 42, 10–21. doi: 10.1016/j.bbi.2014.04.001,
Mizielinska, S., Hautbergue, G. M., Gendron, T. F., van Blitterswijk, M., Hardiman, O., Ravits, J., et al. (2025). Amyotrophic lateral sclerosis caused by hexanucleotide repeat expansions in C9orf72: from genetics to therapeutics. Lancet Neurol. 24, 261–274. doi: 10.1016/s1474-4422(25)00026-2,
Morales, J., Pujar, S., Loveland, J. E., Astashyn, A., Bennett, R., Berry, A., et al. (2022). A joint NCBI and EMBL-EBI transcript set for clinical genomics and research. Nature 604, 310–315. doi: 10.1038/s41586-022-04558-8,
Niedermeyer, S., Murn, M., and Choi, P. J. (2019). Respiratory failure in amyotrophic lateral sclerosis. Chest 155, 401–408. doi: 10.1016/j.chest.2018.06.035,
Noh, M. Y., Oh, S. I., Kim, Y. E., Cha, S. J., Sung, W., Oh, K. W., et al. (2025). Mutations in NEK1 cause ciliary dysfunction as a novel pathogenic mechanism in amyotrophic lateral sclerosis. Mol. Neurodegener. 20:59. doi: 10.1186/s13024-025-00848-7,
Rentzsch, P., Witten, D., Cooper, G. M., Shendure, J., and Kircher, M. (2019). CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 47, D886–d894. doi: 10.1093/nar/gky1016,
Rifai, O. M., Waldron, F. M., Sleibi, D., O'Shaughnessy, J., Leighton, D. J., and Gregory, J. M. (2025). Clinicopathological analysis of NEK1 variants in amyotrophic lateral sclerosis. Brain Pathol. 35:e13287. doi: 10.1111/bpa.13287,
Rizea, R. E., Corlatescu, A. D., Costin, H. P., Dumitru, A., and Ciurea, A. V. (2024). Understanding amyotrophic lateral sclerosis: pathophysiology, diagnosis, and therapeutic advances. Int. J. Mol. Sci. 25:9966. doi: 10.3390/ijms25189966,
Rocha, J. A., Reis, C., Simões, F., Fonseca, J., and Mendes Ribeiro, J. (2005). Diagnostic investigation and multidisciplinary management in motor neuron disease. J. Neurol. 252, 1435–1447. doi: 10.1007/s00415-005-0007-9,
Rutherford, N. J., Heckman, M. G., DeJesus-Hernandez, M., Baker, M. C., Soto-Ortolaza, A. I., Rayaprolu, S., et al. (2012). Length of normal alleles of C9ORF72 GGGGCC repeat do not influence disease phenotype. Neurobiol. Aging 33:2950.e5-7. doi: 10.1016/j.neurobiolaging.2012.07.005,
Theocharis, A. D., Skandalis, S. S., Gialeli, C., and Karamanos, N. K. (2016). Extracellular matrix structure. Adv. Drug Deliv. Rev. 97, 4–27. doi: 10.1016/j.addr.2015.11.001,
Todd, P. K., and Paulson, H. L. (2013). C9orf72-associated FTD/ALS: when less is more. Neuron 80, 257–258. doi: 10.1016/j.neuron.2013.10.010,
Tolochko, C., Shiryaeva, O., Alekseeva, T., and Dyachuk, V. (2025). Amyotrophic lateral sclerosis: pathophysiological mechanisms and treatment strategies (part 2). Int. J. Mol. Sci. 26:5240. doi: 10.3390/ijms26115240,
Tu, H. Q., Li, S., Xu, Y.-L., Zhang, Y.-C., Li, P.-Y., Liang, L.-Y., et al. (2023). Rhythmic cilia changes support SCN neuron coherence in circadian clock. Science 380, 972–979. doi: 10.1126/science.abm1962
Turner, M. R., Barohn, R. J., Corcia, P., Fink, J. K., Harms, M. B., Kiernan, M. C., et al. (2020). Primary lateral sclerosis: consensus diagnostic criteria. J. Neurol. Neurosurg. Psychiatry 91, 373–377. doi: 10.1136/jnnp-2019-322541,
Van Hout, C. V., Tachmazidou, I., Backman, J. D., Hoffman, J. D., Liu, D., Pandey, A. K., et al. (2020). Exome sequencing and characterization of 49,960 individuals in the UK biobank. Nature 586, 749–756. doi: 10.1038/s41586-020-2853-0,
Wang, L., Guo, Q., Acharya, S., Zheng, X., Huynh, V., Whitmore, B., et al. (2024). Primary cilia signaling in astrocytes mediates development and regional-specific functional specification. Nat. Neurosci. 27, 1708–1720. doi: 10.1038/s41593-024-01726-z,
Winhammar, J. M., Rowe, D. B., Henderson, R. D., and Kiernan, M. C. (2005). Assessment of disease progression in motor neuron disease. Lancet Neurol. 4, 229–238. doi: 10.1016/s1474-4422(05)70042-9,
Wu, M. C., Lee, S., Cai, T., Li, Y., Boehnke, M., and Lin, X. (2011). Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 82–93. doi: 10.1016/j.ajhg.2011.05.029,
Yao, L., He, X., Cui, B., Zhao, F., and Zhou, C. (2021). NEK1 mutations and the risk of amyotrophic lateral sclerosis (ALS): a meta-analysis. Neurol. Sci. 42, 1277–1285. doi: 10.1007/s10072-020-05037-6
Keywords: motor neuron disease, protein-truncating variants, rare variants, risk factors, UK biobank
Citation: Hu Z, Wan J-j, Yan Q-q, Fan Y and Liu J (2026) Exploring rare coding variants in UK biobank: preliminary associations with motor neuron disease. Front. Aging Neurosci. 17:1735522. doi: 10.3389/fnagi.2025.1735522
Edited by:
Stefania Zampatti, IRCCS Santa Lucia Foundation, ItalyReviewed by:
Nilo Riva, IRCCS Carlo Besta Neurological Institute Foundation, ItalySilvia Corrochano, Health Research Institute of the Hospital Clínico San Carlos (IdISSC), Spain
Copyright © 2026 Hu, Wan, Yan, Fan and Liu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Jun Liu, amx5MDUyMEBob3RtYWlsLmNvbQ==; Yu Fan, ZmFueXUxMjE4QDEyNi5jb20=
†These authors have contributed equally to this work