Establishment and validation of a phenotype-driven predictive model for the diagnostic efficacy of trio-based whole exome sequencing (trio-WES) in children with genetic neurodevelopmental disorders (g-NDDs)

Wu, Ruohao; Luo, Xiangyang; Meng, Zhe; Tang, Wenting; Liang, Liyang

doi:10.3389/fneur.2025.1574021

ORIGINAL RESEARCH article

Front. Neurol., 15 October 2025

Sec. Neurogenetics

Volume 16 - 2025 | https://doi.org/10.3389/fneur.2025.1574021

This article is part of the Research TopicGenetics and Mechanisms of Neurodevelopmental DisordersView all 12 articles

Establishment and validation of a phenotype-driven predictive model for the diagnostic efficacy of trio-based whole exome sequencing (trio-WES) in children with genetic neurodevelopmental disorders (g-NDDs)

Ruohao Wu¹

Xiangyang Luo^1,2

Zhe Meng¹

Wenting Tang³^*

Liyang Liang¹^*

¹Department of Children’s Medical Center, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, Guangdong, China
²Weierkang Children’s Rehabilitation Center, Guangzhou, Guangdong, China
³Department of Research and Molecular Diagnostics, Sun Yat-sen University Cancer Center, Sun Yat-sen University, Guangzhou, Guangdong, China

Background: Genetic neurodevelopmental disorders (g-NDDs), a complex group of idiopathic syndromes with a heterogeneous genetic etiology, are defined as global developmental delay/intellectual disability (GDD/ID) with other common neurodevelopmental comorbidity (NDC), such as autism spectrum disorder (ASD). Although significant progress has been made in trio-based whole-exome sequencing (trio-WES) that enable the detection of exon-level variants, the diagnostic efficacy of using trio-WES for g-NDDs is still not satisfactory. Therefore, exploring key phenotypic variables for forecasting the diagnostic probability of trio-WES is extremely necessary to implement personalized diagnosis for children with g-NDDs.

Methods: A total of 265 g-NDDs children who received trio-WES at Sun Yat-sen Memorial Hospital between Sep 2016 and Dec 2022 were enrolled and clustered temporally into training and internal validation sets [163 cases (Oct 2016 ~ Dec 2022) and 102 cases (Sep 2016 ~ Sep 2018), respectively]. A total of 97 g-NDDs children who underwent trio-WES at Weierkang Children’s Rehabilitation Center between Jan 2023 and Dec 2023 were enrolled and served as an independent external validation set. Univariate and multivariate logistic regression were conducted in the training set to screen out independent diagnosis-related phenotypic signifiers and establish an alignment diagram model. The model was further validated in internal and external validation sets.

Results: Through univariate and multivariate analyses, independent diagnosis-related predictive signifiers, including GDD/ID severity, NDC complexity, ASD, and head circumference abnormality, were identified in the training cohort and used to construct a model. The model showed good discrimination power with an area under the ROC curves (AUC) in the training set of 0.821 (95% CI: 0.756–0.886), yielding a F1 score of 0.76. The model also showed powerful prediction in both the internal (AUC: 0.905 with 95% CI: 0.842–0.968 and F1 score: 0.77) and external validation sets (AUC: 0.919 with 95% CI: 0.858–0.979 and F1 score: 0.79).

Conclusion: We found the potential linear relationship between trio-WES-diagnostic rates and the phenotypic enrichments in g-NDDs patients for the first time, indicating the possibility of applying a logistic regression model based on phenotypic features to predict the personalized diagnostic rates of using trio-WES in children with g-NDDs.

Introduction

Genetic neurodevelopmental disorders (g-NDDs), a complex group of idiopathic syndromes that tends to have a heterogeneous genetic etiology and represents a complex group of neurological diseases with marked clinical variability and readily observable deficits from a very early age (often <6 months), affecting approximately 1% ~ 3% of children worldwide and exerting a substantial burden on society and patients’ families (1). They mainly comprise global developmental delay/intellectual disability (GDD/ID), autism spectrum disorder (ASD), attention deficit hyperactivity disorder (ADHD), and epilepsy (EP) (2). Among these neurological disease conditions, GDD/ID is the major and essential part of g-NDDs (3); while other common neurological disease conditions, such as ASD, ADHD, and EP, can present as neurodevelopmental comorbidity (NDC), solely or jointly appearing in individuals with g-NDDs (4). Therapeutic intervention/treatment or disease management for g-NDDs is individualized, multidisciplinary, and mainly symptomatic treatment, such as rehabilitation therapy and pharmacotherapy (like encephalon glycoside and exogenous nerve growth factor).

Genetic alterations are considered to play a key role in the pathogenesis of g-NDDs. For instance, 30–50% of g-NDDs cases were reported to be caused by single-nucleotide variants (SNVs), copy-number variants (CNVs) and chromosomal abnormalities; Rett, fragile-X, and Down syndromes are considered as the three most common types of g-NDDs worldwide (5). Owing to the significant advancements in identifying genetic components of g-NDDs via next-generation sequencing technology, especially the popularization of trio-based (parental-offspring model) whole-exome sequencing (trio-WES) that enable the detection of exon-level variants, including SNVs and CNVs, genetic causes are being found more frequently than before in many g-NDDs patients (6). Nonetheless, there still have children with g-NDDs remain undiagnosed after undergoing trio-WES due to a proportion of variants located outside exons (e.g., intron, promoter or enhancer-level variants) underlying g-NDDs (7); the diagnostic yield of trio-WES for children with g-NDDs is still not satisfactory (8–10). Considering the unsatisfactory diagnostic efficacy of trio-WES in g-NDDs, it is crucial for clinicians to develop a targeted approach for the early identification of patients who can most likely be diagnosed by trio-WES, providing those patients with further assessment of related medical conditions earlier in the disease course. Moreover, owing to the exon-level sequencing feature of trio-WES, developing a simple-to-use tool for patients and their families to assist their decision-making in applying trio-WES in the diagnostic strategy at the pre-diagnosis stage is also critical, which can dramatically help facilitate individualized family medical planning.

Alignment diagrams are one of the common graphical calculators used to visualize the scoring system of a linear regression model (such as logistic and cox regression models) with an operationally intuitive manner (11). Owing to the rapid development of many R packages, such as “rms” and “rmda” packages (12), alignment diagrams can be directly generated in the R environment and represent a prevalent and major subtype within the broader category of graphical calculators, characterized by straight scales and solution via a straightedge alignment line (i.e., drawing a straight line between known values on two scales to read the unknown value on a third scale), and providing a high-level snapshot of regression accuracy. Alignment diagram models are widely applied in forecasting the risks or outcomes of many NDDs, such as ADHD (13), ASD (14), infant neurodevelopmental delays (15) and oppositional defiant disorder (16). However, to the best of our knowledge, no publications have reported the application of alignment diagrams to predict the diagnostic probability of trio-WES in g-NDDs children. Here, we conducted a double-center study with independent temporal (internal) and geographical (external) validations based on phenotype-driven methods, exploring key phenotypic variables for forecasting the diagnostic probability of trio-WES in children with g-NDDs. Phenotype-driven methods are important clinical genetic methods for identifying the key clinical phenotypic characteristics of rare monogenic disorders, allowing the identification of individuals with a high probability of carrying relevant exon-level variants detected by trio-WES (17, 18).

Materials and methods

Participants and patient selection criteria

As shown in Figure 1A, 871 g-NDDs children were admitted to the tertiary Children’s Medical Center of Sun Yat-sen Memorial Hospital (SYSMH) from September 2016 to December 2022. After performing a series of clinical information (excluding 346 patients without clear or complete clinical data, such as outpatient subjects), informed consent (excluding 239 patients due to their families decline to receive genetic tests), and routine genetic (G-band karyotyping and fragile-X analysis) screenings (excluding 21 patients who had apparent chromosomal disorders, such as Down syndrome and fragile-X syndrome, which had no need to use high-throughput sequencing, such as trio-WES, as their diagnostic strategies), a total of 265 g-NDDs patients with non-consanguineous relationships from SYSMH who want to get a specific genetic diagnosis via trio-WES were recruited for this retrospective study. They received trio-WES testing from September 1, 2016 to December 31, 2022. Among them, 163 individuals recruited between October 1, 2018 and December 31, 2021, served as training subjects, whereas another 102 individuals with non-consanguineous relationships recruited between September 1, 2016 and September 30, 2018, served as internal validation subjects (also called “temporal validation subjects”). Moreover, 115 individuals diagnosed with g-NDDs were admitted to Weierkang Children’s Rehabilitation Center (WCRC) between January 2023 and December 2023. After a series of screenings (excluding 15 patients without clear or complete clinical data and 3 patients whose families decline to undergo genetic tests), a total of 97 non-consanguineous individuals with g-NDDs were ultimately recruited. They received trio-WES testing from January 1, 2023 to December 31, 2023, and served as the external validation subjects (also called “geographical validation subjects”).

Figure 1

Flowchart (A) details the process of enrolling gNDDs children in studies: it starts with clinical information screening, informed consent, genetic screenings, and trio-WES testing. It divides participants into training and validation cohorts. The table (B) lists candidate variables and their definitions related to conditions like GDD/ID, ASD, ADHD, and more, with criteria for diagnosis.

Figure 1. Flowchart and candidate variables description of this double-center study. (A) Flowchart of this study. (B) A description summary of candidate phenotypic variables applied in current study. g-NDDs, genetic neurodevelopmental disorders; trio-WES, trio-based whole-exome sequencing; SYSMH, Sun Yat-Sen Memorial Hospital; WCRC, Weierkang Children’s Rehabilitation Center; GDD/ID, global developmental delay/intellectual disability; ASD, autism spectrum disorder; ADHD, attention deficit hyperactivity disorder; EP, epilepsy; NDC, neurodevelopmental comorbidity; HCA, head circumference abnormality; BM, brain malformation. Note: “+/−” indicates enrolled individuals who received trio-WES and had a positive genetic diagnosis (carrying a “likely pathogenic” or “pathogenic” variant) or a negative genetic diagnosis (carrying a “benign” or “uncertain significance” variant) on the basis of the guidelines of the American College of Medical Genetics.

The clinical (phenotypic) definitions of g-NDDs used in the current study included the following: (1) various severities of GDD/ID. The clinical diagnostic criteria for GDD/ID were based on the Diagnostic and Statistical Manual of Mental Disorders, 5th Edition guidelines for GDD/ID (1). The severity assessment for GDD/ID was based on the classification criteria of the International Classification of Diseases Version 11 (19). (2) Comorbid with one or more type(s) of common neurodevelopmental comorbidity (NDC), including ASD, ADHD, and EP. Clinical diagnoses of ASD and ADHD were made on the basis of the related DSM-V guidelines for ASD and ADHD (1). The clinical diagnosis of EP was made on the basis of the International League Against Epilepsy guidelines (20). (3) Patients with identified non-genetic causes, such as hypoxic–ischemic encephalopathy, bilirubin encephalopathy and intrauterine infections, were excluded.

The inclusion criteria for this study were as follows: (1) g-NDDs individuals who were admitted to SYSMH (Sep 2016 ~ Dec 2022) or WCRC (Jan 2023 ~ Dec 2023). (2) Underwent systematic and standard examinations, including assessing clinical manifestations/detailed medical history (including GDD/ID condition, history of ASD, EP and ADHD, birth condition and family history), and receiving cardinal neurodevelopment-related auxiliary examinations (Gesell Developmental Scale for Infants/Wechsler Intelligence Scale for Children, electroencephalogram, cranial magnetic resonance imaging, auditory brainstem response/visual evoked potential tests, echocardiography/abdominal ultrasonography). (3) The patients’ families agreed to undergo genetic tests, including routine genetic screening (G-band karyotyping and fragile-X analysis) and trio-WES, and want to obtain a specific genetic diagnosis through trio-WES.

The exclusion criteria for this study were as follows: (1) g-NDDs patients whose medical records were incomplete or unclear or whose clinical data were missing. (2) Positive routine genetic test results indicated that there were apparent chromosomal disorders, such as Down syndrome or fragile-X syndrome, in those individuals, which had no need to use trio-WES as a diagnostic strategy in those patients. (3) Those who underwent trio-WES analysis but did not permit the use of their trio-WES results for publication.

Ethical compliance

The design and launch of this retrospective study were approved by the Ethical Committee of the Sun Yat-sen Memorial Hospital, Sun Yat-sen University (Approval Number: SYSKY-2025-244-01). Written informed consents for genetic investigation and participation in this study were obtained from the parents or guardians of all 362 enrolled individuals.

Methods for variant capture of trio-WES

The principle of variant capture process and quality control system of trio-WES have been described in previous studies (21–23) and the methods in this study can be briefly presented as follows: DNAs were extracted from the whole blood of the proband and their parents using a commercialized genomic extraction kit (Qiagen, Shanghai, China). Illumina TruSeq Exome Kit (Illumina, San Diego, CA, United States) was used for the DNA library construction and the generation of ~10GB exome sequencing data/individual. GeneRanger (Xunyin Biotech, Shanghai, China) was employed for the exome sequencing data analysis. Then, a series of the genome analysis tools were employed for the read alignment, indel region realignment, base quality recalibration, variant capture, and calling/transformation on the basis of the Genome Aggregation Database (gnomAD). Variant capture control system was set to a coverage depth >10 with a minor allele frequency <0.05%.

Criteria for identifying exon-level variant pathogenicity assessment and subject grouping

The pathogenicity of the detected exon-level SNVs via trio-WES was scored on the basis of the 2015 American College of Medical Genetics guidelines for SNVs classification (24), and those candidate SNVs were accordingly divided into “pathogenic/likely pathogenic SNVs” and “benign/uncertainly significant SNVs.” As described in previous research (5), gnomAD and the in-house SNV population frequency databases were employed for the assessment of SNVs allele frequency. Functional analysis of the pathogenicity of those variants was performed via bioinformatic tools. Specifically, in silico prediction of identified missense/frameshift/nonsense/deletion variants’ pathogenicity was conducted using local versions of Mutation Taster, PROVEAN, Polyphen-2, REVEL and SIFT. Moreover, in silico pathogenic prediction of detected splice variants was conducted using local versions of CADD and Human Splice Finder. Human Genomic Mutation Database and PubMed were employed to determine whether identified variants had been recorded previously. GeneReviews and OMIM databases were employed to obtain the genotype–phenotype profiles linked to identified SNVs. The pathogenicity of exon-level CNVs identified via trio-WES was rated on the basis of the 2019 American College of Medical Genetics guidelines for the interpretation of postnatal CNVs (25), and those candidate CNVs were accordingly classified as “pathogenic/likely pathogenic CNVs” and “benign/uncertainly significant CNVs.” All candidate SNVs/CNVs in this study were manually interpreted and assessed by two or more experienced clinical geneticists, following the American College of Medical Genetics guidelines mentioned above. ClinVar database¹ was employed to refer the report status (novel or previously reported) of all identified SNVs and CNVs.

On the basis of the outcome indicator from the trio-WES examination records of the hospital digital system, the enrolled individuals were then grouped into those with positive trio-WES-tested genetic diagnoses (i.e., those with pathogenic/likely pathogenic exon-level SNVs or CNVs in their trio-WES analysis reports) and those with negative trio-WES-tested genetic diagnoses (i.e., those with benign/uncertainly significant exon-level SNVs or CNVs in their trio-WES analysis reports and maybe the out-exon-level variants can explain their g-NDDs conditions).

Candidate variable collection and interpretation of collected variables

The demographic characteristics and phenotypic variables of all enrolled subjects were collected from medical records of the hospital digital system. This included (1) demographic characteristics, such as sex, onset age, and admission date. (2) candidate phenotypic variables: (i) GDD/ID severity; (ii) ASD; (iii) ADHD; (iv) EP; (v) NDC complexity; (vi) Head circumference abnormality (HCA); (vii) Brain structure malformation (BM); (viii) Visual impairment and ocular malformation; (ix) Hearing impairment and ear anomaly. A summary of those phenotypic variables and their detailed definitions were demonstrated in Figure 1B.

Model visualization and reliability and performance evaluation

Independent phenotypic variables were determined via univariate and multivariate logistic regression analysis using the data from the training cohort. Specifically, during the univariate and multivariate regression analyses of the training cohort, variables with clinical significance and significant differences (p < 0.05) were identified between groups with positive and negative trio-WES-based genetic diagnoses, and used for logistic linear model construction.

Firstly, we used Spearman’s correlation analysis to determine whether there were confirmed bi-interdependencies among the identified phenotypic variables. According to the classification of statistical power for correlation analysis established by Cohen (26), when the absolute value of Spearman’s correlation coefficient (| r |) < 0.5 for all pairwise comparisons among those variables, there were no confirmed bi-interdependencies existed among them. Then, we used collinearity analysis to further determine whether there were significant multiple interdependencies among those phenotypic factors. Specifically, the tolerance and variance inflation factor (VIF) were applied to assess the multiple interdependence among identified variables. When the value of tolerance was >0.2 and the VIF was <2 for each identified variable, there were no multiple interdependencies existed among those variables, and could be considered independent phenotypic variables for linear model construction (27). Then, we used the 10-fold-cross resampling with the 1,000-time bootstrap repeated sampling methods to evaluate the goodness-of-fit and reliability degrees of the linear regression model in training set. Specifically, the 10-fold-cross resampling approach is an established and robust resampling technique employed to assess the consistence performance and the reliability of constructed model, which involves partitioning the entire dataset into 10 mutually exclusive and approximately equal-sized folds. During iterative training, 9 folds are aggregated as the training subset; the remaining single fold serves as the validation subset. This process is repeated 10 times with each fold served exactly once as the validation subset. Meanwhile, the bootstrap resampling method is another classical approach that is drawn from the original training dataset with replacement and is repeated 1,000 times. The established performance metric, concordance index (C-index) of the two methods, is calculated to examine the reliability of constructed model; when the two methods’ C-index values were >0.7, indicating the regression model can be regarded as a good reliability linear model for explaining variance of the set outcomes (28). Finally, we generated an alignment diagram using R packages to visualize the scoring system of our constructed logistic regression model.

For model performance evaluation, the discriminatory power was first evaluated in the training set via the area under the curve (AUC) values of the receiver operating characteristic (ROC) curves. Then, the calibration curves with the Hosmer–Lemeshow (H-L) test were used to evaluate the goodness of fit between the predicted calculation and the observed data. Specifically, when the p-value of H-L test <0.05, the dotted line (representing model-predicted data) in calibration curve significantly differed from the solid line (representing actual observed data) in calibration curve, and the model did not fit well. Otherwise, the model fit well (p-value of H-L test >0.05). Moreover, the benefit of the clinical application of the model was also assessed via decision curve analysis (DCA) and a clinical impact curve (CIC).

Model performance validation

According to the maximal Youden index value (corresponding to the most optimal values of sensitivity and specificity of model) calculated in the training set, we further set the corresponding individual’s total score (cut-off score) for the training and internal/external validation cohorts, then based on this cut-off score, we further divided subjects in training and internal/external validation sets into a “model-predicted high probability diagnostic group” and a “model-predicted low probability diagnostic group.” The AUC values of the ROC curves, calibration curves with the H-L test and DCA/CIC were subsequently applied to verify the discriminative performance of model in internal and external validation sets. Finally, we calculated the model sensitivity, specificity, accuracy, precision and F1 score for the training and external validation sets and visualized those results (confusion matrix) via Sankey plots.

Statistical analysis

In this study, Microsoft Excel software was used to record individual data, and all the statistical analyses were conducted in the R environment (version 4.4.2, https://www.r-project.org/). As in previous studies in the R environment (12, 27, 29), the main R packages used for statistical analysis and figure plotting in the present research included “ggplot2,” “foreign,” “rms,” “rmda,” “caret,” “tidyverse” and “ggDCA.” The results with a p-value <0.05 were considered statistically significant throughout the study, if not specially noted.

Results

Patient characteristics

In total, 163, 102, and 97 pediatric and adolescent subjects with unexplained g-NDDs were enrolled in the training, internal validation, and external validation cohorts, respectively. A comparison of the baseline demographics and phenotypic characteristics of the subjects in the training and validation cohorts is shown in Table 1, and the detailed genotypic and phenotypic data of those individuals in the training and validation cohorts can be found in Supplementary Files 1–6, respectively.

Table 1

Table 1. Comparison of baseline demographics or phenotypic characteristics between training cohort and internal/external validation cohorts of trio-WES tested children with g-NDDs.

Among the 163 enrolled subjects in the training cohort, 93 (57.1%) had mild–moderate GDD/ID, and the remainder (70/163, 42.9%) had severe-profound GDD/ID. Most children (129/163, 79.1%) presented as simple g-NDDs without comorbid ASD, ADHD or EP. Meanwhile, parts of cases had ASD (99/163, 60.7%), ADHD (41/163, 25.2%), EP (62/163, 38.0%), HCA (44/163, 27.0%), BM (23/163, 14.1%), visual impairment/ocular malformation (8/163, 4.9%) and hearing impairment/ear anomaly (22/163, 13.%), respectively. 82 (50.3%) had a genetic diagnosis via trio-WES. Of the 82 trio-WES diagnosed children in the training cohort, 62 (75.6%) cases were diagnosed with SNV-mediated syndromes (a total of 72 detected SNVs with 35 being novel variant and 37 being previously reported) and 20 (24.4%) with CNV-mediated syndromes (all these 20 identified CNVs were novel variants).

Among the 102 and 97 enrolled cases in the internal and external validation cohorts, 42 (41.2%) individuals included in the internal validation cohort obtained a genetic diagnosis via trio-WES. Of the 42 trio-WES diagnosed children, 36 (85.7%) cases were diagnosed with SNV-mediated syndromes (a total of 40 detected SNVs with 14 being novel variant and 26 being previously reported) and 6 (14.3%) cases were diagnosed with CNV-mediated syndromes (all these 6 identified CNVs were novel variants). While, in external validation cohort, 42 (43.3%) enrolled subjects received a genetic diagnosis via trio-WES. Of the 42 trio-WES diagnosed children, 32 (76.2%) cases were diagnosed with SNV-mediated syndromes (a total of 34 detected SNVs with 16 being novel variant and 18 being previously reported) and 10 cases (23.8%) diagnosed with CNV-mediated syndromes (all these 10 identified CNVs were novel variants).

Independent predictive indicator selection and regression model construction

First, univariate logistic analysis for all included phenotypic variables presented in Table 1 was performed in the training set to determine the potential trio-WES diagnosis-related phenotypic markers. As demonstrated in Table 2, the results of univariate logistic analysis revealed that four candidate phenotypic markers (GDD/ID severity, NDC complexity, ASD, and HCA) had potential predictive value. Among them, severe-profound GDD/ID [OR (95% CI): 6.880 (3.414–13.865), p < 0.001], multiple NDCs [OR (95% CI): 4.237 (1.783–10.067), p < 0.01], ASD [OR (95% CI): 4.673 (2.360–9.253), p < 0.001], and HCA [OR (95% CI): 4.286 (1.977–9.292), p < 0.001] were associated with high possibilities of having genetic diagnosis via trio-WES in g-NDDs subjects.

Table 2

Table 2. Univariate and multivariate logistic regression for predicting diagnostic efficacy of using trio-WES in g-NDDs individuals in training set.

On the basis of the univariate regression analysis, the phenotypic factors with p < 0.05, including GDD/ID severity, NDC complexity, ASD, and HCA (Figure 1B), were candidate indicators for multivariate linear regression model construction. The four variables were firstly analyzed via Spearman’s correlation analysis. As shown in Figure 2A, the values of |r| for pairwise comparisons among those four variables were all <0.5, indicating that no confirmed bi-interdependencies existed among them. Then, the four indicators were subsequently analyzed via collinearity analysis. As demonstrated in Table 3, the tolerances of each phenotypic factor were all >0.2, and the values of the VIFs of each indicator were all <2, indicating that no multiple interdependencies existed among those four phenotypic variables.

Figure 2

Chart A displays bar graphs showing case numbers for different conditions: GDD/ID severity, NDC complexity, and ASD or HCA status. Chart B shows a line graph with C-index values across ten resamples. Chart C presents a histogram of C-index bootstrapped distribution with one thousand resamples, indicating a range between 0.65 and 0.90.

Figure 2. Spearman’s correlation analysis and Goodness-of-fit tests for evaluation of the reliability of the constructed logistic regression model. (A) Bar charts showing the pairwise comparisons of |r| values among those model-constructed variables across training cohort. (B) Point-fold line chart with 10-fold-cross resampling approach showing the model had good stability with excellent consistence in training set (C-index, 0.797 with 95% CI, 0.732–0.862). (C) Histogram with 1,000-time resampling bootstrap method revealing the model did not overfit and showed good reliability in training set (C-index, 0.800 with 95% CI, 0.707–0.893). |r|, Spearman’s correlation coefficient; GDD/ID, global developmental delay/intellectual disability; ASD, autism spectrum disorder; NDC, neurodevelopmental comorbidity; HCA, head circumference abnormality.

Table 3

Table 3. The collinearity diagnostic analysis of variables for predicting efficacy of using trio-WES in g-NDDs individuals in training set.

Those four phenotypic factors without interdependencies were incorporated into the multivariate logistic regression. As indicated in Table 2, our multivariate analysis results further revealed that GDD/ID severity [OR (95% CI): 4.865 (2.213–10.694), p < 0.001], NDC complexity [OR (95% CI): 3.731 (1.399–9.950), p = 0.009], ASD [OR (95% CI): 3.256 (1.479–7.168), p = 0.003] and HCA [OR (95% CI): 2.788 (1.148–6.774), p = 0.024] were independently associated with the diagnostic efficacy of trio-WES among g-NDDs subjects. Then, we used the 10-fold-cross validation and the bootstrap repeated sampling approach for evaluation the reliability of constructed model. As shown in Figure 2B, after conducting 10-fold-cross validation, the C-index value was 0.797 with a 95% CI of 0.732–0.862. In the meantime, after performing bootstrap method with 1,000-time resampling, the C-index also show robust value with 0.800 (95% CI: 0.707–0.893) (Figure 2C). All those results (both two approaches’ C-indices were >0.7) indicated that the constructed regression model owned good goodness-of-fit and can be regarded as a reliability linear model for explaining variance of the set outcome (diagnosed by trio-WES) for g-NDDs patients.

According to the multivariate logistic regression β value and the intercept term, a novel regression model for predicting the diagnostic efficacy of using trio-WES in pediatric patients with g-NDDs was constructed, and the corresponding formula for predicting the probability (P) of a g-NDDs subject being diagnosed by trio-WES was as follows: Logit (P) = 1.582 (β₁) × GDD/ID severity (severe-profound: 1; mild–moderate: 0) + 1.317 (β₂) × NDC complexity (complicated: 1; simple: 0) + 1.181 (β₃) × ASD (yes: 1; no: 0) + 1.025 (β₄) × HCA (yes: 1; no: 0) – 1.914 (intercept term). The most prevalent scoring methodology for included variables is utilization of the coefficients (β) derived from regression model. The calculated score assigned to each variable is proportional to its β value, often mapped to a 0 to 100-point scale via linear transformation (30). Specifically, we set 100-point to a variable with the max β value (β_max), and based on that, we can calculate other included variables’ calculated score via formula “calculated score_x = 100 × β_x÷β_max.” Detailed coefficients of this regression model and the corresponding calculated score for each included variable were demonstrated in Table 4.

Table 4

Table 4. Coefficients of binary logistic regression for predicting diagnostic efficacy via trio-WES in children with g-NDDs in training set.

Model visualization via alignment diagram and the assessment of the related visualized scoring system

As shown in Figure 3A, by using R packages “rms” and “rmda,” we generated an alignment diagram to visualize the constructed logistic regression model and its related scoring system (Table 4). Specifically, the alignment diagram provided an estimate of the probability of being diagnosed by trio-WES via the scored contributions of the four included binary phenotypic indicators and could be used for an g-NDDs child at the time of initial admission. According to the scoring system visualized by the alignment diagram and distinct phenotypic manifestations of all enrolled 362 g-NDDs subjects in training and validation cohorts, we could assign corresponding scores to each individual, thereby deriving an aggregate score for each of them. Based on their aggregate scores, we could further classify those subjects into different subgroups at each aggregate score. As revealed in Figures 3B–D, bar charts showing the percentages of cases with positive trio-WES diagnoses in different subgroups at each aggregate score across the training, internal and external validation cohorts, and we found that a higher aggregate score was consistently and strongly associated with a significantly increased rate of positive trio-WES diagnoses across all these three cohorts, further demonstrating the robust predictive power of this scoring system for identifying g-NDDs cases likely to yield a positive genetic diagnosis via trio-WES.

Figure 3

Chart A displays scores and diagnostic indicators for GDD/ID severity, NDC complexity, ASD presence, and HCA presence. Charts B, C, and D present bar graphs of trio-WES diagnoses, split into negative and positive, across various calculated total scores. Blue represents negative diagnoses, and red represents positive diagnoses.

Figure 3. Predictive model visualization by the alignment diagram and the assessment of this visualized scoring system across the training and validation cohorts. (A) The alignment diagram includes two parts: the top part (starts with “Calculated Score” and goes down to the last phenotypic factor, “HCA”) is used to calculate the scores of each included phenotypic biomarker, and the bottom part (starts with “Calculated Total Scores” and goes down to “Diagnosis by trio-WES”) is designed to determine the probability of having a genetic diagnosis via trio-WES. Bar charts revealing the percentage distributions of cases with positive trio-WES diagnoses at each score across the training (B), internal (C) and external (D) validation cohorts. GDD/ID, global developmental delay/intellectual disability; NDC, neurodevelopmental comorbidity; ASD, autism spectrum disorder; HCA, head circumference abnormality; trio-WES, trio-based whole-exome sequencing; g-NDDs, genetic neurodevelopmental disorders.

Evaluation of the predictive model in the training cohort

In the training set, the calibration curves with the H-L test were first used to assess the fitness of the predictive model. As revealed in Figure 4A, the calibration plots of the model showed a good fit between the actual and model-predicted diagnostic rates, and in the H-L test, χ² = 5.571 with a p-value = 0.473, further indicating satisfactory consistency between the model predictions and the actual observed values.

Figure 4

Chart A is a calibration plot showing predicted versus actual probability with apparent, bias-corrected, and ideal lines; Hosmer-Lemeshow P-value is 0.473. Chart B is a ROC curve showing sensitivity versus 1-specificity, with AUC 0.821 and a confidence interval of 0.756 to 0.886. Chart C is a net benefit graph comparing a predictive model and alternatives against threshold probability. Chart D is a decision curve plot showing the number at high risk and the number at high risk with events versus high risk threshold.

Figure 4. Discriminatory performance of the model in the training set. (A) Calibration plots, (B) ROC curves for assessing the predictive accuracy of the model in the training cohort, and (C) DCA and (D) CIC for examining the predicted clinical utility or impact of the model in the training set. AUC, area under the ROC curves; 95% CI, 95% confidence interval; ROC, receiver operating characteristic; DCA, decision curve analysis; CIC, clinical impact curve.

ROC curves were subsequently used to assess the discriminative performance of the predictive model. As shown in Figure 4B, the results demonstrated that the AUC = 0.821 (95% CI: 0.756–0.886), suggesting that the model had good predictive value in the training set. According to the ROC curves of the training cohort, the maximal Youden index was 0.547 and was selected to set the optimal cut-off score (148), which generated a confusion matrix with values of sensitivity, specificity, false negative rate (FNR), false positive rate (FPR), accuracy, precision, and F1 score of 73.20, 81.50, 26.80, 18.50, 77.30, 80.00%, and 0.76, respectively, in the training set (Figure 5A and Table 5).

Figure 5

Flow diagram showing predictive model outcomes for three groups: A (Training), B (inVal), and C (exVal). Each group's total cases are split into High and Low categories based on cut-off scores. Outcomes are measured as True Positive (TP), False Positive (FP), False Negative (FN), and True Negative (TN) against the Trio-WES diagnosis. Group A has 75 High and 88 Low cases, B has 41 High and 61 Low, and C has 39 High and 58 Low. Positive and negative progression through model categories culminates in Yes or No diagnoses.

Figure 5. Sankey plots revealing the discriminatory performance of the predictive model across training and validation cohorts. (A) Training cohort, (B) internal validation cohort and (C) external validation cohort. Training, training cohort; inVal, internal validation cohort; exVal, external validation cohort. FP, false positive; TP, true positive; TN, true negative; FN, false negative. Note: the maximal Youden index (0.547) based on the training cohort was chosen to set the optimal cut-off score (148), a critical value that clustered those three cohorts (Training, inVal, and exVal cohorts) into groups with high probability and low probability of having a genetic diagnosis via trio-WES.

Table 5

Table 5. Predictive performance of the constructed model in training and validation sets.

Moreover, we also applied DCA and the CIC to evaluate the clinical usefulness of the predictive model in the training set. As demonstrated in Figures 4C,D, individuals with g-NDDs could obtain greater net benefits from our model than from the hypothetical treat-none or treat-all scenarios, suggesting that the use of this model to predict the diagnostic efficacy of trio-WES for g-NDDs children may bring clinicians more benefits.

Internal (temporal) and external (geographical) validations of the model performance

As shown in Figure 6A, the calibration curves with the H-L test revealed excellent agreement between the model predictions and the actual observed results (χ² = 5.221 with p-value = 0.516) in the internal set. Moreover, the ROC plots also revealed that the model had a powerful discriminative ability in the internal set (Figure 6B, AUC: 0.905, 95% CI: 0.842–0.968). As reflected by the results of the DCA and CIC in the internal set, applying our model to predict the diagnostic efficacy of trio-WES for g-NDDs patients could result in greater net benefits (Figures 6C,D). As reflected by the confusion matrix results of the Sankey plot in the internal set, the sensitivity, specificity, FNR, FPR, accuracy, precision and F1 score of the model in the internal set were 76.20, 85.00, 23.80, 15.00, 81.40, 78.00%, and 0.77, respectively (Figure 5B and Table 5).

Figure 6

Four graphs labeled A to D analyze predictive model performance. A: Calibration plot comparing predicted and actual probabilities, showing a Hosmer-Lemeshow P-value of 0.516. B: ROC curve with AUC of 0.905 and confidence interval of 0.842 to 0.968. C: Decision curve analysis indicating net benefit against threshold probability. D: Graph showing the number of high-risk predictions versus events, plotted against high-risk thresholds.

Figure 6. Internal verification of the discriminatory performance of the model in the internal validation set. (A) Calibration plots and (B) ROC curves verifying the estimation accuracy of genetic diagnosis via trio-WES on the basis of the predictive model. (C) DCA and (D) CIC validating the clinical value of the model in the internal validation set. AUC, area under the ROC curves; 95% CI, 95% confidence interval; ROC, receiver operating characteristic; DCA, decision curve analysis; CIC, clinical impact curve.

Similarly, the calibration curves with the H-L test (χ² = 2.494 with p value = 0.777), ROC curves (AUC: 0.919, 95% CI: 0.858–0.979) and DCA/CIC were all applied in the external set (Figures 7A–D). A confusion matrix results of the Sankey plot in the external set showing the sensitivity, specificity, FNR, FPR, accuracy, precision and F1 score of the model in the external set were 76.20, 87.30, 23.80, 12.70, 82.50, 82.10%, and 0.79, respectively (Figure 5C and Table 5).

Figure 7

Four-panel image showing:A. Calibration plot comparing predicted and actual probabilities with lines for apparent, bias-corrected, and ideal models. Hosmer-Lemeshow test P-value is 0.777.B. Receiver operating characteristic (ROC) curve indicating sensitivity versus one minus specificity. The area under the curve (AUC) is 0.919, with a 95% confidence interval of 0.858 to 0.979.C. Decision curve analysis of predictive model net benefit against threshold probability. Lines represent predictive model, treating all, and treating none.D. Graph depicting the number considered high risk and those with events against the high-risk threshold, showing cost-benefit ratio.

Figure 7. External verification of the discriminatory performance of the model in the external validation set. (A) Calibration plots and (B) ROC curves verifying the estimation accuracy of genetic diagnosis via trio-WES on the basis of the predictive model. (C) DCA and (D) CIC validating the clinical value of the model in the internal validation set. AUC, area under the ROC curves; 95% CI, 95% confidence interval; ROC, receiver operating characteristic; DCA, decision curve analysis; CIC, clinical impact curve.

Discussion

With the rapid development of next-generation sequencing technology, genetic causes are being detected more frequently than before in many individuals with unexplained syndrome involving multiple organ malformations (5). The application of next-generation sequencing has completely changed the landscape of clinical genetics; compared with conventional tests (such as family segregation analysis and Sanger sequencing), trio-WES provides an effective way to identify exon-level variants and determine the diagnosis of many rare monogenetic disorders, avoiding previous “diagnostic odysseys” experienced by many patients and their families (31–34). Nonetheless, we should note that technical limitations (exon-level sequencing only), along with the complexity of the genome (such as deep intronic or non-coding variants), may inevitably hinder the effectiveness of trio-WES. Whole-genome sequencing can offer advantages over trio-WES by identifying both exon-level and out-exon-level variants (such as structural, intronic and non-coding variants) but compared with trio-WES charges (almost $ 1,000 ~ 1,400 in China), the costs of whole-genome sequencing per subject in China is almost double than that of trio-WES. Due to the higher cost and more intense analysis, the application of whole-genome sequencing in clinical settings as a first-tier genetic diagnostic technique is still restricted (35–37). To date, trio-WES still provides an efficient and appropriate tool for first-tier genetic diagnosis worldwide and is invaluable for subsequent genetic counseling and relevant medical management. Therefore, it is meaningful to explore effective strategies to predict the diagnostic yield of trio-WES in clinical settings.

The diagnostic rate of trio-WES in clinical settings is likely influenced by a large variety of factors, such as the disorder type, phenotypic spectrum and age of onset (17, 36). For example, for individuals with congenital dermatological syndromes, the diagnostic rate can even reach 92% (17); perhaps the clear phenotypic presentation of those syndromes contributes to its high diagnostic rate. Thus, correct and precise assessment of subjects’ clinical data or phenotypic spectrum at the pre-WES stage is very important and presents a challenge, as it involves stringent collection and comprehensive analysis of the variable phenotypic features of every case. In the present study, we carefully and comprehensively collected detailed phenotypic information that can reflect neurodevelopment conditions objectively for every single subject to increase the diagnostic yield, and the overall diagnostic rates in the training, internal validation, and external validation cohorts were 50.3, 41.2 and 43.3%, respectively, which were higher than those reported in previous WES studies of patients with GDD/ID alone (27 to 39%) (8–10). Our findings concerning the diagnostic yield of using trio-WES for g-NDDs diagnosis further reinforce the conclusions presented by a research that the presence of GDD/ID along with multiple phenotypic features can enhance the diagnostic yield of trio-WES (2), emphasizing the importance of phenotypic feature enrichment in improving the exon-level variants detection rate of using trio-WES (18). According to the theory of the phenotype-to-genotype process and the phenotype-driven strategy, we speculate on the possibility that using key phenotypic factors that related to a high probability of having genetic variants detected by WES to construct a model for predicting trio-WES diagnostic rate, which may provide valuable information for personalized diagnostic regimens for g-NDDs children.

Among those four phenotypic factors (GDD/ID severity, NDC complexity, ASD, and HCA) identified in our study, the strongly top 3 associated features with positive trio-WES results were severe-profound GDD/ID (OR: 4.865, 95% CI: 2.213–10.694), followed by complicated NDC complexity (OR: 3.731, 95% CI: 1.399–9.950) and ASD (OR: 3.256, 95% CI: 1.479–7.168). The current findings suggest that severe-profound GDD/ID, ASD and a broad spectrum of NDCs may share genetic backgrounds in the context of rare monogenic NDDs, leading to a higher diagnostic rate of trio-WES. In the Simons Foundation Autism Research Initiative (SFARI, http://gene.sfari.org/) database, there are over one thousand genes involved in gene expression regulation and neuronal communication functions that are implicated in susceptibility to ASD and other coexisting NDCs, such as ADHD (38). Alterations in the functions of neuronal communication and gene expression regulation are also known to be closely associated with GDD/ID-related genes (39), which may explain the shared genetic backgrounds among severe-profound GDD/ID, ASD and a broad spectrum of NDCs. Moreover, we also revealed that HCA (OR: 2.788, 95% CI: 1.148–6.774) were moderately associated with positive trio-WES results under g-NDDs conditions. We speculate that in the process of craniofacial development, neuronal crosstalk and reciprocal signaling between the craniofacial ectoderm and neural crest cells play crucial roles in the regulation of craniofacial patterning and morphogenesis (40, 41). The process of neural crest development is regulated by epigenetic modifications, including chromatin remodeling, histone modification and DNA methylation (42). Alterations in gene expression regulation signaling and associated neuronal communication can result in disruptions in neural crest development, leading to a set of syndromes affecting a broad spectrum of congenital craniofacial malformations, among which HCA is prominent. Therefore, variants in genes involved in gene expression regulation and neuronal communication may impair multiple neurodevelopmental processes, causing severe-profound GDD/ID, ASD, and a broad spectrum of NDCs in addition to HCA. Our findings demonstrate that the presence of these four phenotypic features might indicate abnormalities in genes related to gene expression regulation or neuronal communication and may be strongly linked to positive trio-WES results, providing novel insights into the genotype–phenotype associations of g-NDDs. On the other hand, previous studies had already revealed that the four phenotypic features (severe-profound GDD/ID, having ASD, complicated NDC complexity, and HCA) are strongly established as signifiers of rare monogenic NDDs (2, 5, 43), and variants at exon-level had also been identified as the main cause of rare monogenic NDDs (44); given these close associations among them, it is reasonable to presume that a g-NDDs subject exhibiting more phenotypic signifiers related to rare monogenic NDDs may have more probabilities of harboring relevant exon-level variant(s), and thus can be diagnosed more easily via trio-WES. However, due to the lack of enough g-NDDs cases with confirmed variants outside exons (such as in introns), we cannot determine whether there have significant differences of those phenotypic signifiers between cases carrying variants at exon-level and at out-exon-level. More genotype–phenotype analyses are still needed to corroborate our speculation, and will be the focus of our future work.

The present study revealed that the established logistic regression model based on those four easily-obtained phenotypic factors exhibited good calibration and discrimination with high accuracy and precision in both the training and validation cohorts. However, the current model has several limitations. First, although the FNR of this model across training and validation sets (around 23% ~ 26%) was acceptable for a primary study, as many studies demonstrated that the predictive model’s FNR of being around 25% was tolerable especially for a preliminary study (45–47); it still showed relatively higher compared with the low FPR (around 12% ~ 18%) of the model, suggesting that the predictive model needs further refinement; perhaps it was too simplistic that all included indicators were binary variables, which can inevitably affect the model performance. Further improvements, such as introducing more polytomous variables or complicated variables into model, are required. Second, this was a pilot study, and the number of subjects enrolled in the current study was relatively small. Moreover, it is essential to establish a rigorous evaluation framework that ensures reproducibility across hospital and is independent of the individual physician’s judgment, which indicated that there have insufficient evidences to support this model could apply in the clinical practice directly; more multicenter studies (center number > 2) with a large sample size (case size > 500) and rigorous evaluation framework are needed to further validate our model. Moreover, our predictive model was based on a retrospective analysis; how it performs in prospective studies remains to be further evaluated.

Conclusion

In conclusion, we found the potential linear relationship between trio-WES-diagnostic rates and the phenotypic enrichments in g-NDDs children for the first time, indicating the possibility of applying a logistic regression model based on phenotypic features to predict the personalized diagnostic rates of using trio-WES in children with g-NDDs. However, due to the false negatives existed in the established model of this pilot study, this model could not apply directly in the clinical practice as its current form; further improvements are required to reduce false negatives toward 0%.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding authors.

Ethics statement

The studies involving humans were approved by the Ethical Committee of Sun Yat-sen Memorial Hospital. The studies were conducted in accordance with the local legislation and institutional requirements. Written informed consent for participation in this study was provided by the participants’ legal guardians/next of kin.

Author contributions

RW: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Validation, Visualization, Writing – original draft. XL: Resources, Writing – review & editing. ZM: Resources, Writing – review & editing. WT: Software, Supervision, Writing – review & editing. LL: Funding acquisition, Validation, Writing – review & editing.

Funding

The author(s) declare that no financial support was received for the research and/or publication of this article.

Acknowledgments

We thank all the children and parents for their participation in this study. We also feel grateful to Li Lin and Jianqiu Kong for their professional biostatistical supports; Yi Sun and Xiaohui Duan for their professional consultations of neurodevelopmental assessment and MRI-diagnoses on brain structure malformations.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The authors declare that no Gen AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fneur.2025.1574021/full#supplementary-material

Footnotes

1. ^http://www.ncbi.nlm.nih.gov/clinvar

References

1. First, MB. Diagnostic and statistical manual of mental disorders, 5th edition, and clinical utility. J Nerv Ment Dis. (2013) 201:727–9. doi: 10.1097/NMD.0b013e3182a2168a

PubMed Abstract | Crossref Full Text | Google Scholar

2. Hiraide, T, Yamoto, K, Masunaga, Y, Asahina, M, Endoh, Y, Ohkubo, Y, et al. Genetic and phenotypic analysis of 101 patients with developmental delay or intellectual disability using whole-exome sequencing. Clin Genet. (2021) 100:40–50. doi: 10.1111/cge.13951

PubMed Abstract | Crossref Full Text | Google Scholar

3. Bertelli, MO, Munir, K, Harris, J, and Salvador-Carulla, L. Intellectual developmental disorders: reflections on the international consensus document for redefining mental retardation-intellectual disability in ICD-11. Adv Ment Health Intellect Disabil. (2016) 10:36–58. doi: 10.1108/amhid-10-2015-0050

PubMed Abstract | Crossref Full Text | Google Scholar

4. Hanly, C, Shah, H, Au, PYB, and Murias, K. Description of neurodevelopmental phenotypes associated with 10 genetic neurodevelopmental disorders: a scoping review. Clin Genet. (2021) 99:335–46. doi: 10.1111/cge.13882

PubMed Abstract | Crossref Full Text | Google Scholar

5. Wu, R, Li, X, Meng, Z, Li, P, He, Z, and Liang, L. Phenotypic and genetic analysis of children with unexplained neurodevelopmental delay and neurodevelopmental comorbidities in a Chinese cohort using trio-based whole-exome sequencing. Orphanet J Rare Dis. (2024) 19:205. doi: 10.1186/s13023-024-03214-w

PubMed Abstract | Crossref Full Text | Google Scholar

6. Srivastava, S, Love-Nichols, JA, Dies, KA, Ledbetter, DH, Martin, CL, Chung, WK, et al. Meta-analysis and multidisciplinary consensus statement: exome sequencing is a first-tier clinical diagnostic test for individuals with neurodevelopmental disorders. Genet Med. (2019) 21:2413–21. doi: 10.1038/s41436-019-0554-6

PubMed Abstract | Crossref Full Text | Google Scholar

7. Chen, Y, Dawes, R, Kim, HC, Ljungdahl, A, Stenton, SL, Walker, S, et al. De novo variants in the RNU4-2 snRNA cause a frequent neurodevelopmental syndrome. Nature. (2024) 632:832–40. doi: 10.1038/s41586-024-07773-7

PubMed Abstract | Crossref Full Text | Google Scholar

8. Anazi, S, Maddirevula, S, Faqeih, E, Alsedairy, H, Alzahrani, F, Shamseldin, HE, et al. Clinical genomics expands the morbid genome of intellectual disability and offers a high diagnostic yield. Mol Psychiatry. (2017) 22:615–24. doi: 10.1038/mp.2016.113

PubMed Abstract | Crossref Full Text | Google Scholar

9. Bowling, KM, Thompson, ML, Amaral, MD, Finnila, CR, Hiatt, SM, Engel, KL, et al. Genomic diagnosis for children with intellectual disability and/or developmental delay. Genome Med. (2017) 9:43. doi: 10.1186/s13073-017-0433-1

PubMed Abstract | Crossref Full Text | Google Scholar

10. Monroe, GR, Frederix, GW, Savelberg, SM, de Vries, TI, Duran, KJ, van der Smagt, JJ, et al. Effectiveness of whole-exome sequencing and costs of the traditional diagnostic trajectory in children with intellectual disability. Genet Med. (2016) 18:949–56. doi: 10.1038/gim.2015.200

Crossref Full Text | Google Scholar

11. Liu, W, Liu, C, Zhuang, W, Chen, J, He, Q, Xue, X, et al. Construction of an alignment diagram model for predicting calculous obstructive pyonephrosis before PNL. Heliyon. (2024) 10:e28448. doi: 10.1016/j.heliyon.2024.e28448

PubMed Abstract | Crossref Full Text | Google Scholar

12. Wu, R, Li, X, Chen, Z, Shao, Q, Zhang, X, Tang, W, et al. Development and validation of a nomogram based on common biochemical indicators for survival prediction of children with high-risk neuroblastoma: a valuable tool for resource-limited hospitals. BMC Pediatr. (2023) 23:426. doi: 10.1186/s12887-023-04228-2

PubMed Abstract | Crossref Full Text | Google Scholar

13. Gao, T, Yang, L, Zhou, J, Zhang, Y, Wang, L, Wang, Y, et al. Development and validation of a nomogram prediction model for ADHD in children based on individual, family, and social factors. J Affect Disord. (2024) 356:483–91. doi: 10.1016/j.jad.2024.04.069

PubMed Abstract | Crossref Full Text | Google Scholar

14. Zhang, Y, Maimaiti, R, Lou, S, Abula, R, Abulaiti, A, and Kelimu, A. Risk prediction of autism spectrum disorder behaviors among children based on blood elements by nomogram: a cross-sectional study in Xinjiang from 2018 to 2019. J Affect Disord. (2022) 318:1–6. doi: 10.1016/j.jad.2022.08.130

PubMed Abstract | Crossref Full Text | Google Scholar

15. Costantine, MM, Tita, ATN, Mele, L, Casey, BM, Peaceman, AM, Varner, MW, et al. The association between infant birth weight, head circumference, and neurodevelopmental outcomes. Am J Perinatol. (2024) 41:e1313-e1323. doi: 10.1055/s-0043-1761920

PubMed Abstract | Crossref Full Text | Google Scholar

16. Parkhurst, JT, Vesco, AT, Ballard, RR, and Lavigne, JV. Improving diagnostic accuracy: comparison of nomograms and classification tree analyses for predicting the diagnosis of oppositional defiant disorder. Psychol Serv. (2023) 20:184–95. doi: 10.1037/ser0000670

PubMed Abstract | Crossref Full Text | Google Scholar

17. Marinakis, NM, Svingou, M, Veltra, D, Kekou, K, Sofocleous, C, Tilemis, FN, et al. Phenotype-driven variant filtration strategy in exome sequencing toward a high diagnostic yield and identification of 85 novel variants in 400 patients with rare Mendelian disorders. Am J Med Genet A. (2021) 185:2561–71. doi: 10.1002/ajmg.a.62338

PubMed Abstract | Crossref Full Text | Google Scholar

18. Wang, Q, Tang, X, Yang, K, Huo, X, Zhang, H, Ding, K, et al. Deep phenotyping and whole-exome sequencing improved the diagnostic yield for nuclear pedigrees with neurodevelopmental disorders. Mol Genet Genomic Med. (2022) 10:e1918. doi: 10.1002/mgg3.1918

PubMed Abstract | Crossref Full Text | Google Scholar

19. Harrison, JE, Weber, S, Jakob, R, and Chute, CG. ICD-11: an international classification of diseases for the twenty-first century. BMC Med Inform Decis Mak. (2021) 21:206. doi: 10.1186/s12911-021-01534-6

PubMed Abstract | Crossref Full Text | Google Scholar

20. Scheffer, IE, Berkovic, S, Capovilla, G, Connolly, MB, French, J, Guilhoto, L, et al. ILAE classification of the epilepsies: position paper of the ILAE Commission for Classification and Terminology. Epilepsia. (2017) 58:512–21. doi: 10.1111/epi.13709

PubMed Abstract | Crossref Full Text | Google Scholar

21. Dai, Y, Liang, S, Dong, X, Zhao, Y, Ren, H, Guan, Y, et al. Whole exome sequencing identified a novel DAG1 mutation in a patient with rare, mild and late age of onset muscular dystrophy-dystroglycanopathy. J Cell Mol Med. (2019) 23:811–8. doi: 10.1111/jcmm.13979

PubMed Abstract | Crossref Full Text | Google Scholar

22. Han, P, Wei, G, Cai, K, Xiang, X, Deng, WP, Li, YB, et al. Identification and functional characterization of mutations in LPL gene causing severe hypertriglyceridaemia and acute pancreatitis. J Cell Mol Med. (2020) 24:1286–99. doi: 10.1111/jcmm.14768

PubMed Abstract | Crossref Full Text | Google Scholar

23. Zhang, R, Chen, S, Han, P, Chen, F, Kuang, S, Meng, Z, et al. Whole exome sequencing identified a homozygous novel variant in CEP290 gene causes Meckel syndrome. J Cell Mol Med. (2020) 24:1906–16. doi: 10.1111/jcmm.14887

PubMed Abstract | Crossref Full Text | Google Scholar

24. Richards, S, Aziz, N, Bale, S, Bick, D, Das, S, Gastier-Foster, J, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. (2015) 17:405–24. doi: 10.1038/gim.2015.30

PubMed Abstract | Crossref Full Text | Google Scholar

25. Riggs, ER, Andersen, EF, Cherry, AM, Kantarci, S, Kearney, H, Patel, A, et al. Technical standards for the interpretation and reporting of constitutional copy-number variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics (ACMG) and the clinical genome resource (ClinGen). Genet Med. (2020) 22:245–57. doi: 10.1038/s41436-019-0686-8

PubMed Abstract | Crossref Full Text | Google Scholar

26. Cohen, J. A power primer. Psychol Bull. (1992) 112:155–9. doi: 10.1037//0033-2909.112.1.155

PubMed Abstract | Crossref Full Text | Google Scholar

27. Xu, B, Gao, Y, Zhang, Q, Li, X, Liu, X, Du, J, et al. Establishment and validation of a multivariate predictive model for the efficacy of oral rehydration salts in children with postural tachycardia syndrome. EBioMedicine. (2024) 100:104951. doi: 10.1016/j.ebiom.2023.104951

PubMed Abstract | Crossref Full Text | Google Scholar

28. Jian, X, Du, S, Zhou, X, Xu, Z, Wang, K, Dong, X, et al. Development and validation of nomograms for predicting the risk probability of carbapenem resistance and 28-day all-cause mortality in gram-negative bacteremia among patients with hematological diseases. Front Cell Infect Microbiol. (2022) 12:969117. doi: 10.3389/fcimb.2022.969117

PubMed Abstract | Crossref Full Text | Google Scholar

29. Tang, W, Shao, Q, He, Z, Zhang, X, Li, X, and Wu, R. Clinical significance of nonerythrocytic spectrin Beta 1 (SPTBN1) in human kidney renal clear cell carcinoma and uveal melanoma: a study based on Pan-Cancer analysis. BMC Cancer. (2023) 23:303. doi: 10.1186/s12885-023-10789-3

PubMed Abstract | Crossref Full Text | Google Scholar

30. Tatsuo, O, Gondo, MR, Hamada, TG, and Hamada, R. Nomogram as predictive model in clinical practice. Gan To Kagaku Ryoho. (2009) 36:901–6.

Google Scholar

31. Bertier, G, Hétu, M, and Joly, Y. Unsolved challenges of clinical whole-exome sequencing: a systematic literature review of end-users' views. BMC Med Genet. (2016) 9:52. doi: 10.1186/s12920-016-0213-6

PubMed Abstract | Crossref Full Text | Google Scholar

32. Iglesias, A, Anyane-Yeboa, K, Wynn, J, Wilson, A, Truitt Cho, M, Guzman, E, et al. The usefulness of whole-exome sequencing in routine clinical practice. Genet Med. (2014) 16:922–31. doi: 10.1038/gim.2014.58

PubMed Abstract | Crossref Full Text | Google Scholar

33. Niguidula, N, Alamillo, C, Shahmirzadi Mowlavi, L, Powis, Z, Cohen, JS, and Farwell Hagman, KD. Clinical whole-exome sequencing results impact medical management. Mol Genet Genomic Med. (2018) 6:1068–78. doi: 10.1002/mgg3.484

PubMed Abstract | Crossref Full Text | Google Scholar

34. Yang, Y, Muzny, DM, Reid, JG, Bainbridge, MN, Willis, A, Ward, PA, et al. Clinical whole-exome sequencing for the diagnosis of mendelian disorders. N Engl J Med. (2013) 369:1502–11. doi: 10.1056/NEJMoa1306555

PubMed Abstract | Crossref Full Text | Google Scholar

35. Nomakuchi, TT, Teferedegn, EY, Li, D, Muirhead, KJ, Dubbs, H, Leonard, J, et al. Utility of genome sequencing in exome-negative pediatric patients with neurodevelopmental phenotypes. Am J Med Genet A. (2024) 194:e63817. doi: 10.1002/ajmg.a.63817

PubMed Abstract | Crossref Full Text | Google Scholar

36. Taylor, J, Craft, J, Blair, E, Wordsworth, S, Beeson, D, Chandratre, S, et al. Implementation of a genomic medicine multi-disciplinary team approach for rare disease in the clinical setting: a prospective exome sequencing case series. Genome Med. (2019) 11:46. doi: 10.1186/s13073-019-0651-9

PubMed Abstract | Crossref Full Text | Google Scholar

37. White, SJ, Laros, JFJ, Bakker, E, Cambon-Thomsen, A, Eden, M, Leonard, S, et al. Critical points for an accurate human genome analysis. Hum Mutat. (2017) 38:912–21. doi: 10.1002/humu.23238

Crossref Full Text | Google Scholar

38. Banerjee-Basu, S, and Packer, A. SFARI gene: an evolving database for the autism research community. Dis Model Mech. (2010) 3:133–5. doi: 10.1242/dmm.005439

PubMed Abstract | Crossref Full Text | Google Scholar

39. Iwase, S, Bérubé, NG, Zhou, Z, Kasri, NN, Battaglioli, E, Scandaglia, M, et al. Epigenetic etiology of intellectual disability. J Neurosci. (2017) 37:10773–82. doi: 10.1523/jneurosci.1840-17.2017

PubMed Abstract | Crossref Full Text | Google Scholar

40. Luquetti, DV, Heike, CL, Hing, AV, Cunningham, ML, and Cox, TC. Microtia: epidemiology and genetics. Am J Med Genet A. (2012) 158A:124–39. doi: 10.1002/ajmg.a.34352

PubMed Abstract | Crossref Full Text | Google Scholar

41. Wu, R, Tang, W, Li, P, Meng, Z, Li, X, and Liang, L. Identification of a novel phenotype of external ear deformity related to coffin-Siris syndrome-9 and literature review. Am J Med Genet A. (2024) 194:e63626. doi: 10.1002/ajmg.a.63626

PubMed Abstract | Crossref Full Text | Google Scholar

42. Hu, N, Strobl-Mazzulla, PH, and Bronner, ME. Epigenetic regulation in neural crest development. Dev Biol. (2014) 396:159–68. doi: 10.1016/j.ydbio.2014.09.034

PubMed Abstract | Crossref Full Text | Google Scholar

43. Wang, C, Zhou, W, Zhang, L, Fu, L, Shi, W, Qing, Y, et al. Diagnostic yield and novel candidate genes for neurodevelopmental disorders by exome sequencing in an unselected cohort with microcephaly. BMC Genomics. (2023) 24:422. doi: 10.1186/s12864-023-09505-z

PubMed Abstract | Crossref Full Text | Google Scholar

44. Rauch, A, Wieczorek, D, Graf, E, Wieland, T, Endele, S, Schwarzmayr, T, et al. Range of genetic mutations associated with severe non-syndromic sporadic intellectual disability: an exome sequencing study. Lancet. (2012) 380:1674–82. doi: 10.1016/s0140-6736(12)61480-9

PubMed Abstract | Crossref Full Text | Google Scholar

45. Aberle, DR, Adams, AM, Berg, CD, Black, WC, Clapp, JD, Fagerstrom, RM, et al. Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med. (2011) 365:395–409. doi: 10.1056/NEJMoa1102873

PubMed Abstract | Crossref Full Text | Google Scholar

46. Moons, KG, Altman, DG, Reitsma, JB, Ioannidis, JP, Macaskill, P, Steyerberg, EW, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med. (2015) 162:W1–W73. doi: 10.7326/m14-0698

Crossref Full Text | Google Scholar

47. Moons, KGM, Wolff, RF, Riley, RD, Whiting, PF, Westwood, M, Collins, GS, et al. PROBAST: a tool to assess risk of Bias and applicability of prediction model studies: explanation and elaboration. Ann Intern Med. (2019) 170:W1–w33. doi: 10.7326/m18-1377

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: genetic neurodevelopmental disorders, neurodevelopmental comorbidity, trio-based whole-exome sequencing, exon-level variants, diagnostic efficacy, phenotype-driven, alignment diagram

Citation: Wu R, Luo X, Meng Z, Tang W and Liang L (2025) Establishment and validation of a phenotype-driven predictive model for the diagnostic efficacy of trio-based whole exome sequencing (trio-WES) in children with genetic neurodevelopmental disorders (g-NDDs). Front. Neurol. 16:1574021. doi: 10.3389/fneur.2025.1574021

Received: 11 February 2025; Accepted: 02 October 2025;
Published: 15 October 2025.

Edited by:

Joseph Alaimo, Children’s Mercy Hospital, United States

Reviewed by:

Chiara Reale, IRCCS Carlo Besta Neurological Institute Foundation, Italy
Maud de Dieuleveult, Institut National de la Santé et de la Recherche Médicale (INSERM), France
Santasree Banerjee, Jilin University, China

Copyright © 2025 Wu, Luo, Meng, Tang and Liang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Liyang Liang, bGlhbmdsaXlAbWFpbC5zeXN1LmVkdS5jbg==; Wenting Tang, dGFuZ3d0QHN5c3VjYy5vcmcuY24=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.