Skip to main content

METHODS article

Front. Bioinform., 08 March 2024
Sec. Drug Discovery in Bioinformatics
Volume 4 - 2024 | https://doi.org/10.3389/fbinf.2024.1295600

A network-based method for associating genes with autism spectrum disorder

www.frontiersin.orgNeta Zadok1* www.frontiersin.orgGil Ast2 www.frontiersin.orgRoded Sharan1
  • 1Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
  • 2Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel

Autism spectrum disorder (ASD) is a highly heritable complex disease that affects 1% of the population, yet its underlying molecular mechanisms are largely unknown. Here we study the problem of predicting causal genes for ASD by combining genome-scale data with a network propagation approach. We construct a predictor that integrates multiple omic data sets that assess genomic, transcriptomic, proteomic, and phosphoproteomic associations with ASD. In cross validation our predictor yields mean area under the ROC curve of 0.87 and area under the precision-recall curve of 0.89. We further show that it outperforms previous gene-level predictors of autism association. Finally, we show that we can use the model to predict genes associated with Schizophrenia which is known to share genetic components with ASD.

Introduction

Autism spectrum disorder (ASD) is a complex neurological and developmental disorder that affects a person’s behavior, communication, and learning abilities. About 1 in 44 children is identified with the disorder according to estimates from Centers for Disease Control and Prevention (CDC) Autism and Developmental Disabilities Monitoring (ADDM) Network (Maenner, 2021). It is thought to be caused by a combination of genetic and environmental factors that impacts the structure and function of the brain and nervous system (Flickr, 2023). Identifying the genetic base of ASD is critical to understanding its underlying biological mechanisms. Such knowledge will impact the development of new interventions and treatments for individuals affected by this disorder.

Extensive molecular studies have charted the landscape of ASD with respect to different information layers including genome wide association (GWAS), differential gene expression (Parikshak et al., 2016; Gandal et al., 2018), differential transcript expression (Gandal et al., 2018), alternative splicing changes (Parikshak et al., 2016; Gandal et al., 2018), differential methylation (Wong et al., 2019), copy number variation (Sanders et al., 2015) and more (Satterstrom et al., 2020). Each of these studies have come up with candidate lists of ASD-associated genes, calling for computational methods to consolidate these gene lists.

Machine learning based methods offer a new perspective to the problem by learning from known ASD-related genes and building models that provide ways to prioritize the risk associated with previously unknown genes based on their predicted scores (Liu et al., 2014; Krishnan et al., 2016; Duda et al., 2018; Brueggeman et al., 2020; Lin et al., 2020). These methods differ in their training features and prioritization method. (Duda et al., 2018) and (Krishnan et al., 2016) leveraged brain-specific functional interaction networks to produce genome-wide rankings of ASD associated genes. (Liu et al., 2014) clustered evidence for ASD association within a co-expression network in specific brain regions. (Lin et al., 2020) used features extracted from spatiotemporal gene expression patterns in the human brain. Last, the state-of–the-art forecASD (Brueggeman et al., 2020) integrates network-based information from large gene interaction networks with scores of genetic association and brain gene expression information. In detail, novel features are generated from BrainSpan expression and STRING interaction data. These are combined with literature-derived features from DAWN, DAMAGES, and Krishnan (Liu et al., 2014); (Krishnan et al., 2016; Zhang and Shen, 2017) and used to train a random forest classifier for ASD association.

The methods described above have predominantly relied on a single data source, potentially missing relevant information. Conversely, some approaches such as forecASD utilize multiple data sources that are devoid of network context. To address these limitations, our proposed method leverages a network propagation technique to integrate diverse data sources while accounting for their network context. Our model employs network-propagation on ASD associated genes from different data sets to derive predictive gene scores. Features are then combined using a random forest classifier. We evaluate the performance of our model and compare it to previous methods. Finally, we use our model to predict Schizophrenia-associated genes.

Materials and methods

Classification pipeline

The computational pipeline has two stages. First, network-based gene features are generated using a network propagation technique. Second, a random forest model is applied to these features to yield the prediction score. The classifier is summarized in Figure 1. Its stages are described in the following sections.

FIGURE 1
www.frontiersin.org

FIGURE 1. ASD Classifier Flowchat. Putative ASD-associated genes serve as seeds to a network propagation process. After propagation, the resulting scores yield classification features for a random forest classifier that is used to compute association scores.

Feature generation

Our starting point is gene lists obtained from the literature as detailed in Table 1. Overall, we use ten gene sets that were suggested to be associated with ASD based on various layers of information.

TABLE 1
www.frontiersin.org

TABLE 1. Lists of ASD associated genes according to different sources.

Each of these ASD related gene lists is used as a seed for a network propagation process that pinpoints other genes with high proximity to the seed set in a protein-protein interaction (PPI) network. The initial value of each seed protein from a list of size s is set to 1/s. We use a human PPI network from (Signorini et al., 2021) which has 20,933 proteins and 251,078 interactions in its main connected component. We run network propagation with default damping parameter ɑ = 0.8. We normalize the results using the eigenvector centrality method (Erten et al., 2011) in order to avoid biases which are caused by the degrees of the proteins. The resulting ten propagation scores for each gene comprise its feature set.

Random forest model

The features of a gene are integrated using a random forest model. To train the model, we use SFARI’s Gene Scoring Module (Abrahams et al., 2013) which offers critical evaluation of the strength of the evidence for each gene’s association with ASD. The genes are assigned to four categories: “Syndromic” (S), “Category 1” (High Confidence), “Category 2” (Strong Candidate) and “Category 3” (Suggestive Evidence). We label “Category 1” genes as positives (206 in total) and randomly pick 206 negative genes that do not appear in the SFARI database.

The random forest model is trained with the “sklearn” Python package using its default parameters which are a maximum of 100 trees, no maximum tree depth and minimum number of samples required to split an internal node of 2.

Results and discussion

Model performance

We tested our classifier using 5-fold cross-validation. Figure 2 depicts ROC and precision-recall graphs showing the final result as the mean between all the five fold scores. The AUROC is 0.87 and the AUPRC is 0.89 indicating the high accuracy of our method.

FIGURE 2
www.frontiersin.org

FIGURE 2. Performance evaluation. Performance of our classifier in 5-fold cross validation. The result of each fold and its mean are shown. Left: ROC curves. Right: Precision-recall curves.

To facilitate the model’s application by potential users, we calculated an optimal classification cutoff of 0.86, which maximizes the product of specificity and sensitivity (Liu, 2012). The full model scores can be found in Supplementary Table S2.

To further support our results, we tested our classifier’s predictions on SFARI genes of scores 2 and 3 (which were not used during training), compared to randomly chosen negative genes (Figure 3). The classifier’s score distributions of the groups were compared using the Wilcoxon signed-rank test. Reassuringly, SFARI genes with scores 2 and 3 got significantly higher scores than the random negative genes (p-value < 3.62e-34).

FIGURE 3
www.frontiersin.org

FIGURE 3. Classifier validation. Histograms (left) and box-plots (right) for SFARI (scores 2 and 3, orange) vs. random genes (blue).

After establishing the accuracy of our proposed method, we conducted a comparative analysis with the state-of-the-art forecASD classifier, which was shown to outperform earlier predictors. For this purpose, we employed the same random forest classifier on both our dataset and the features suggested by forecASD, namely BrainSpan and STRING. We additionally assessed the propagation procedure on a random degree-preserving network to act as a negative control. The results are provided in Figure 4 showing the superiority of our method compared to forecASD and the negative control (AUROC of 0.91 vs. 0.87 and 0.82). Notably, the relatively high AUROC of the negative control testifies to the quality of the gene sets that serve as seeds for the network propagation.

FIGURE 4
www.frontiersin.org

FIGURE 4. Performance comparison against forecASD and a negative control. Shown here are the mean ROCAUC results of each classifier.

Functional annotation analysis

Next, we wished to analyze the functional roles of the top predicted genes. To this end, we set a prediction threshold of 0.947, which maximizes the sum of precision and recall, and focused on the 84 genes passing this threshold. We conducted functional enrichment analysis using g:Profiler (version e109_eg56_p17_1d3191d) with Bonferroni corrected p-values and a significance threshold of 0.001 (Raudvere et al., 2019). The analysis is based on several data sources (GO:MF, GO:BP, Human Phenotype Ontology) (Figure 5).

FIGURE 5
www.frontiersin.org

FIGURE 5. Functional annotation results for genes predicted by our classifier. (A) Manhattan plot created with g:Profiler illustrates the enrichment analysis results. The x-axis represents functional terms that are grouped and color-coded by data sources. The y-axis shows the adjusted enrichment p-values in negative log10 scale. Highlighted points in the plot are the terms which got the highest scores, and highlighted driver terms in GO created by g:Profiler algorithm. The algorithm is used for filtering GO enrichment results, providing a more efficient and reliable approach compared to traditional clustering methods. (B) Detailed information about highlighted circles from the Manhattan graph. Detailed information include data source, id and name of the term together with corresponding p-value.

From Human Phenotype Ontology—Autistic Behavior was the highest enriched phenotype. Using GO:BP (Biological Process) and GO:MF(Molecular Function) data sources, yield several highly enriched pathways known to play important roles in autism etiology including chromatin organization and binding (Haddad Derafshi et al., 2022), histone modification (Sun et al., 2016), neuron cell-cell adhesion (Eve et al., 2022) and zinc ion binding (Walsh et al., 2001; Walsh, 2002; Wang and Zhou, 2010; Yasuda et al., 2011). The full list of the functional annotation analysis results can be found in Supplementary Table S1.

Exploiting schizophrenia genes

Given the known phenotypic similarity between ASD and schizophrenia (SCZ) (Hommer and Swedo, 2015) we wished to test whether our classification model could be further improved by adding information on SCZ associated genes. Thus, we collected lists of genes that were associated with SCZ (Table 2) and reapplied our computational pipeline.

TABLE 2
www.frontiersin.org

TABLE 2. Lists of SCZ associated genes.

Next, we checked if our ASD classifier can also predict schizophrenia associated genes. We ran it on genes which are associated with SCZ from a database for Schizophrenia genetic research, SZDB (Wu et al., 2017). Specifically, we focused on 1622 SCZ-associated genes from SZDB with scores higher than 3, and compared their scores to those of the same number of random genes (Figure 6). Wilcoxon signed-rank test showed that the classifier gave the SCZ-associated genes significantly higher scores than the random genes (p-value of 2.275e-11).

FIGURE 6
www.frontiersin.org

FIGURE 6. Classifier Predictions on SCZ Genes. The distribution of SCZ gene scores (Orange) versus that of random genes (Blue). The distribution is shown both in histograms (A) and in box plots (B).

Conclusion

We have presented a classification model for disease association. The classifier uses network propagation which enables us to combine and to amplify signals from individual genes, and uses machine learning to allow us to learn from known genes in order to classify new ones. In application to ASD, our classifier attained high accuracy, outperforming the state of the art. We have further shown its applicability to SCZ, which benefits from the similarity between these diseases but also shows the generality of our approach.

Functional enrichment analysis of our proposed candidate ASD genes has pointed to several pathways and processes that have been previously linked to ASD. For instance, neuron cell-cell adhesion, which may contribute to neuroinflammation in ASD (Eve et al., 2022) and participates in neurodevelopmental pathways associated with the disorder (Gandawijaya et al., 2020). Moreover (Bonsi et al., 2022), demonstrated that genes associated with ASD are frequently involved in the structural organization and functional activity of synapses, as evident in our results indicating enriched pathways such as synapse assembly and presynaptic and postsynaptic membrane assembly. This is interesting, as ASD is sometimes regarded as a disorder of connectivity (Mohammad-Rezazadeh et al., 2016), since its genes have both direct and indirect effect on a range of presynaptic and postsynaptic proteins (Bonsi et al., 2022; Yeo et al., 2022).

The fact that our classifier succeeded in predicting Schizophrenia-associated genes may suggest that previously implied association between ASD and SCZ may evolve from connectivity-related issues (Hommer and Swedo, 2015).

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Ethics statement

Ethical approval was not required for the study involving humans in accordance with the local legislation and institutional requirements. Written informed consent to participate in this study was not required from the participants or the participants’ legal guardians/next of kin in accordance with the national legislation and the institutional requirements.

Author contributions

NZ: Conceptualization, Data curation, Formal Analysis, Investigation, Methodology, Project administration, Software, Validation, Visualization, Writing–original draft. GA: Funding acquisition, Writing–review and editing. RS: Funding acquisition, Writing–review and editing, Conceptualization, Project administration, Supervision.

Funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This research was supported by a research grant from the Israel Science Foundation (IPMP grant no. 2417/20).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fbinf.2024.1295600/full#supplementary-material

References

Abrahams, B. S., Arking, D. E., Campbell, D. B., Mefford, H. C., Morrow, E. M., Weiss, L. A., et al. (2013). SFARI Gene 2.0: a community-driven knowledgebase for the autism spectrum disorders (ASDs). Mol. autism 4 (1), 36–43. doi:10.1186/2040-2392-4-36

PubMed Abstract | CrossRef Full Text | Google Scholar

Bonsi, P., De Jaco, A., Fasano, L., and Gubellini, P. (2022). Postsynaptic autism spectrum disorder genes and synaptic dysfunction. Neurobiol. Dis. 162, 105564. doi:10.1016/j.nbd.2021.105564

PubMed Abstract | CrossRef Full Text | Google Scholar

Brueggeman, L., Koomar, T., and Michaelson, J. J. (2020). Forecasting risk gene discovery in autism with machine learning and genome-scale data. Sci. Rep. 10 (1), 4569. doi:10.1038/s41598-020-61288-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Duda, M., Zhang, H., Li, H. D., Wall, D. P., Burmeister, M., and Guan, Y. (2018). Brain-specific functional relationship networks inform autism spectrum disorder gene prediction. Transl. psychiatry 8 (1), 56. doi:10.1038/s41398-018-0098-6

PubMed Abstract | CrossRef Full Text | Google Scholar

Erten, S., Bebek, G., Ewing, R. M., and Koyutürk, M. (2011). DADA: degree-aware algorithms for network-based disease gene prioritization. BioData Min. 4, 19. doi:10.1186/1756-0381-4-19

PubMed Abstract | CrossRef Full Text | Google Scholar

Eve, M., Gandawijaya, J., Yang, L., and Oguro-Ando, A. (2022). Neuronal cell adhesion molecules may mediate neuroinflammation in autism spectrum disorder. Front. psychiatry/Front. Res. Found. 13, 842755. doi:10.3389/fpsyt.2022.842755

CrossRef Full Text | Google Scholar

Flickr, F. (2023). Us on (no date) about Autism. Available at: https://www.nichd.nih.gov/health/topics/autism/conditioninfo (Accessed February 28, 2023).

Google Scholar

Gandal, M. J., Zhang, P., Hadjimichael, E., Walker, R. L., Chen, C., Liu, S., et al. (2018). Transcriptome-wide isoform-level dysregulation in ASD, schizophrenia, and bipolar disorder. Science 362 (6420), eaat8127. Available at:. doi:10.1126/science.aat8127

PubMed Abstract | CrossRef Full Text | Google Scholar

Gandawijaya, J., Bamford, R. A., Burbach, J. P. H., and Oguro-Ando, A. (2020). Cell adhesion molecules involved in neurodevelopmental pathways implicated in 3p-deletion syndrome and autism spectrum disorder. Front. Cell. Neurosci. 14, 611379. doi:10.3389/fncel.2020.611379

PubMed Abstract | CrossRef Full Text | Google Scholar

Haddad Derafshi, B., Danko, T., Chanda, S., Batista, P. J., Litzenburger, U., Lee, Q. Y., et al. (2022). The autism risk factor CHD8 is a chromatin activator in human neurons and functionally dependent on the ERK-MAPK pathway effector ELK1. Sci. Rep. 12 (1), 22425. doi:10.1038/s41598-022-23614-x

PubMed Abstract | CrossRef Full Text | Google Scholar

Hommer, R. E., and Swedo, S. E. (2015). Schizophrenia and autism-related disorders. Schizophr. Bull. 41 (2), 313–314. doi:10.1093/schbul/sbu188

PubMed Abstract | CrossRef Full Text | Google Scholar

Huang, G., Osorio, D., Guan, J., Ji, G., and Cai, J. J. (2020). Overdispersed gene expression in schizophrenia. NPJ Schizophr. 6 (1), 9. doi:10.1038/s41537-020-0097-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Krishnan, A., Zhang, R., Yao, V., Theesfeld, C. L., Wong, A. K., Tadych, A., et al. (2016). Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder. Nat. Neurosci. 19 (11), 1454–1462. doi:10.1038/nn.4353

PubMed Abstract | CrossRef Full Text | Google Scholar

Lin, Y., Afshar, S., Rajadhyaksha, A. M., Potash, J. B., and Han, S. (2020). A machine learning approach to predicting autism risk genes: validation of known genes and discovery of new candidates. Front. Genet. 11, 500064. doi:10.3389/fgene.2020.500064

PubMed Abstract | CrossRef Full Text | Google Scholar

Liu, L., Lei, J., Sanders, S. J., Willsey, A. J., Kou, Y., Cicek, A. E., et al. (2014). DAWN: a framework to identify autism genes and subnetworks using gene expression and genetics. Mol. autism 5 (1), 22–18. doi:10.1186/2040-2392-5-22

PubMed Abstract | CrossRef Full Text | Google Scholar

Liu, X. (2012). Classification accuracy and cut point selection. Statistics Med. 31 (23), 2676–2686. doi:10.1002/sim.4509

PubMed Abstract | CrossRef Full Text | Google Scholar

Maenner, M. J. (2021). ‘Prevalence and characteristics of autism spectrum disorder among children aged 8 Years — autism and developmental Disabilities monitoring network, 11 sites, United States, 2018’, morbidity and mortality weekly report. Surveill. Summ., 70. Available at:. doi:10.15585/mmwr.ss7011a1

CrossRef Full Text | Google Scholar

Mohammad-Rezazadeh, I., Frohlich, J., Loo, S. K., and Jeste, S. S. (2016). Brain connectivity in autism spectrum disorder. Curr. Opin. neurology 29 (2), 137–147. doi:10.1097/wco.0000000000000301

PubMed Abstract | CrossRef Full Text | Google Scholar

Parikshak, N. N., Swarup, V., Belgard, T. G., Irimia, M., Ramaswami, G., Gandal, M. J., et al. (2016). Genome-wide changes in lncRNA, splicing, and regional gene expression patterns in autism. Nature 540 (7633), 423–427. doi:10.1038/nature20612

PubMed Abstract | CrossRef Full Text | Google Scholar

Raudvere, U., Kolberg, L., Kuzmin, I., Arak, T., Adler, P., Peterson, H., et al. (2019). g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic acids Res. 47 (W1), W191–W198. doi:10.1093/nar/gkz369

PubMed Abstract | CrossRef Full Text | Google Scholar

Sanders, S. J., He, X., Willsey, A., Ercan-Sencicek, A., Samocha, K., Cicek, A., et al. (2015). Insights into autism spectrum disorder genomic architecture and biology from 71 risk loci. Neuron 87 (6), 1215–1233. doi:10.1016/j.neuron.2015.09.016

PubMed Abstract | CrossRef Full Text | Google Scholar

Satterstrom, F. K., Kosmicki, J. A., Wang, J., Breen, M. S., De Rubeis, S., An, J. Y., et al. (2020). Large-scale exome sequencing study implicates both developmental and functional changes in the neurobiology of autism. Cell. 180 (3), 568–584.e23. doi:10.1016/j.cell.2019.12.036

PubMed Abstract | CrossRef Full Text | Google Scholar

Signorini, L. F., Almozlino, T., and Sharan, R. (2021). ANAT 3.0: a framework for elucidating functional protein subnetworks using graph-theoretic and machine learning approaches. BMC Bioinforma. 22 (1), 526–6. doi:10.1186/s12859-021-04449-1

CrossRef Full Text | Google Scholar

Sun, W., Poschmann, J., Cruz-Herrera del Rosario, R., Parikshak, N. N., Hajan, H. S., Kumar, V., et al. (2016). Histone acetylome-wide association study of autism spectrum disorder. Cell. 167 (5), 1385–1397.e11. doi:10.1016/j.cell.2016.10.031

PubMed Abstract | CrossRef Full Text | Google Scholar

Walsh, W. J., et al. (2002). Metallothionein and autism’, and autism 2nd edn. Naperville, IL: pfeiffer.

Google Scholar

Walsh, W. J., Usman, A., and Tarpey, J. (2001). Disordered metal metabolism in a large autism population. walshinstitute.Org. Available at: http://www.walshinstitute.org/uploads/1/7/9/9/17997321/disordered_metal_metabolism_in_a_large_autism_population.pdf (Accessed August 28, 2023).

Google Scholar

Wang, X., and Zhou, B. (2010). Dietary zinc absorption: a play of Zips and ZnTs in the gut. IUBMB life 62 (3), 176–182. doi:10.1002/iub.291

PubMed Abstract | CrossRef Full Text | Google Scholar

Wong, C. C. Y., Smith, R. G., Hannon, E., Ramaswami, G., Parikshak, N. N., Assary, E., et al. (2019). Genome-wide DNA methylation profiling identifies convergent molecular signatures associated with idiopathic and syndromic autism in post-mortem human brain tissue. Hum. Mol. Genet. 28 (13), 2201–2211. doi:10.1093/hmg/ddz052

PubMed Abstract | CrossRef Full Text | Google Scholar

Wu, Y., Yao, Y.-G., and Luo, X.-J. (2017). SZDB: a database for schizophrenia genetic research. Schizophr. Bull. 43 (2), 459–471. doi:10.1093/schbul/sbw102

PubMed Abstract | CrossRef Full Text | Google Scholar

Yasuda, H., Yoshida, K., Yasuda, Y., and Tsutsui, T. (2011). Infantile zinc deficiency: association with autism spectrum disorders. Sci. Rep. 1, 129. doi:10.1038/srep00129

PubMed Abstract | CrossRef Full Text | Google Scholar

Yeo, X. Y., Lim, Y. T., Chae, W. R., Park, C., Park, H., and Jung, S. (2022). Alterations of presynaptic proteins in autism spectrum disorder. Front. Mol. Neurosci. 15, 1062878. doi:10.3389/fnmol.2022.1062878

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, C., and Shen, Y. (2017). A cell type-specific expression signature predicts haploinsufficient autism-susceptibility genes. Hum. Mutat. 38 (2), 204–215. doi:10.1002/humu.23147

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: autism spectrum disorder (ASD), network propagation, machine learning, ASD genes, random forest

Citation: Zadok N, Ast G and Sharan R (2024) A network-based method for associating genes with autism spectrum disorder. Front. Bioinform. 4:1295600. doi: 10.3389/fbinf.2024.1295600

Received: 16 September 2023; Accepted: 26 February 2024;
Published: 08 March 2024.

Edited by:

Yovani Marrero-Ponce, University of Valencia, Spain

Reviewed by:

Hanbing Song, University of California, San Francisco, United States
Guillermin Agüero-Chapin, University of Porto, Portugal

Copyright © 2024 Zadok, Ast and Sharan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Neta Zadok, netaza@gmail.com

Download