Network Diffusion-Based Prioritization of Autism Risk Genes Identifies Significantly Connected Gene Modules

Autism spectrum disorder (ASD) is marked by a strong genetic heterogeneity, which is underlined by the low overlap between ASD risk gene lists proposed in different studies. In this context, molecular networks can be used to analyze the results of several genome-wide studies in order to underline those network regions harboring genetic variations associated with ASD, the so-called “disease modules.” In this work, we used a recent network diffusion-based approach to jointly analyze multiple ASD risk gene lists. We defined genome-scale prioritizations of human genes in relation to ASD genes from multiple studies, found significantly connected gene modules associated with ASD and predicted genes functionally related to ASD risk genes. Most of them play a role in synapsis and neuronal development and function; many are related to syndromes that can be in comorbidity with ASD and the remaining are involved in epigenetics, cell cycle, cell adhesion and cancer.


INTRODUCTION
"Autism spectrum disorder" (ASD) includes clinically and etiologically wide range of neurodevelopmental disorders such as the less severe disorders Asperger's syndrome and pervasive developmental disorder, not otherwise specified, as well as the most severe childhood disintegrative disorder. ASD symptoms are recognized mainly by the complex behavioral phenotype that manifests within the first 3 years of life: difficult in communication and social interaction, limited interests and repetitive behaviors (National Institute of Mental Health, 2013).
The approaches currently used to disentangle the genetic complexity of ASDs include large genome-wide association studies (GWAS), CNV testing and genome sequencing. Interestingly, the application of these different approaches yielded many non-overlapping genes, which may suggest different molecular mechanisms within connected pathways (Pinto et al., 2014). The analysis of molecular interactions and pathways is therefore crucial for the interpretation of the results emerging from genome-scale studies on a pathology marked by a significant genetic heterogeneity. Indeed biological pathways associated with a specific pathology are likely to be more conserved than individual genetic variations, because multiple combinations of variations might perturb each pathway (Barabási et al., 2011). Network-based and pathway-based analyses can therefore provide a functional explanation to non-overlapping genes and narrow the targets for therapeutic intervention (Devlin and Scherer, 2012).
One of the problems that network-based analyses can solve is indeed the identification of the so-called disease modules, i.e., network regions associated with a disease (Barabási et al., 2011). Recently, molecular interaction networks have been used in the analysis of ASD genetic data to define gene networks associated with ASD. The identification of a subnetwork with desired properties from a large biological network (like the one formed by all protein-protein interactions) poses many challenges and therefore several approaches have been proposed (Mitra et al., 2013).
Regarding the integration of networks and ASD genetic data, Cristino et al. (2014) studied the interacting partners of genes known to be associated with ASDs and other related disorders; Noh et al. (2013) identified a significantly interconnected network of genes affected by CNVs; Li et al. (2014) studied the association between ASDs and genes forming topological communities (clusters of genes with a high density of connection between genes of the community and less connections with genes outside the community); Gilman et al. (2011) found functionally connected clusters of genes affected by CNVs.
Recently, network smoothing index (NSI) was proposed as a network-based quantity that allows to define a network region enriched with a priori information (Bersanelli et al., 2016). The NSI is based on network diffusion, a method that simulates the flow of a fluid throughout a network. NSI quantifies the network relevance of each gene in relation to a set of input genes (e.g., ASD genes), considering the whole network and mitigating the importance of hubs.
In this work, we use network diffusion and the NSI to propose a possible disease module for ASD, encompassing the network regions most frequently hit by molecular variations reported in several studies and collected in curated public databases. Moreover, our study introduces a network-based genome-wide prioritization of genes in relation to their known and predicted relevance for ASDs.

Molecular Interactions
STRING interactions were collected from STRING (version 10), a database of direct and indirect PPIs (Szklarczyk et al., 2015). Native identifiers were mapped to Entrez Gene (Brown et al., 2015) identifiers. In case multiple proteins mapped to the same gene identifier, only the pair of gene identifiers with the highest STRING confidence score was considered. A total of 11,535 genes and 207,157 links with confidence score ≥700 was retained.

ASD Risk Genes
ASD risk genes were collected from The Simons Foundation Autism Research Initiative SFARI Gene database (Abrahams et al., 2013, version available in July 2015 and from Li et al. (2014).
In addition to SFARIh genes, we considered 5 sources (namely dnCNVn, dnCNV3s, rCNV, dMUT, mMUT) of genes harboring CNVs and mutations associated with ASDs, proposed by large recent studies . dnCNVn contains genes from an ASD-associated network composed of genes with de novo CNVs identified in 181 individuals and genes previously implicated in ASDs (Noh et al., 2013). dnCNV3s contains genes with de novo CNVs found in all three independent studies on more than 1,000 families Sanders et al., 2011;Pinto et al., 2014). rCNV contains genes with rare CNVs found in a study involving approximately 1000 ASD individuals of European ancestry and matched controls (Pinto et al., 2010). dMUT and mMUT include genes with, respectively, disruptive and missense mutations (Neale et al., 2012;O'Roak et al., 2012;Sanders et al., 2012;Li et al., 2014).
Only genes occurring in STRING network were considered in network-based analyses (Table 1). Therefore, whenever appropriate, we will specifically refer to the original gene lists and the corresponding derived lists with only genes occurring in STRING network using suffixes "i" and "n0" respectively (e.g., SFARIh i , SFARIh n0 ).

Network-Based Analysis
Given an input gene list L and a gene network encoded as the nby-n symmetrically normalized adjacency matrix W (Bersanelli et al., 2016), the n-sized vector X 0 was defined in order to have positive quantities only in its elements representing the genes in L, and null values for all the other genes. Network diffusion finds the vector X * , in which the quantities initially available in X 0 are subject to smoothing according to the pattern of interactions W. The vector X * was calculated using an iterative procedure (Zhou et al., 2004), as described in Bersanelli et al. (2016): Frontiers in Genetics | www.frontiersin.org where α [here set to 0.7 as in previous works (Bersanelli et al., 2016)] is a parameter that weights to which extent the initial information is retained or spread throughout the network. In the independent smoothing of each of the six ASD gene lists described above, genes belonging to the list were set to 1. In the joint analysis of all gene lists, genes belonging to SFARIh n0 were set to 1, while genes belonging to other lists were set to 0.5: this setting was chosen so that genes strongly associated with ASD had a higher priority. For each gene g, the network smoothing index S quantifies the network proximity of g to genes marked by a positive value in X 0 , i.e., associated with ASD, as ratio between gene values after and before network diffusion: where ε is a small positive quantity that weights the importance of the initial values X 0 . In order to mitigate the tendency of hub genes to gather excessive amounts of information only because of their central position, the permutation-adjusted network smoothing index S p was introduced as, where p S (g) is an empirical p-value, computed using K permutations of X 0 , each one denoted as X k 0 , and the corresponding S k (g) (calculated using X k 0 ): In the analysis of the six ASD risk gene lists, ε values were defined in order to predict genes in network proximity to the input genes. Given a gene set of size N, ε was set in order to obtain, among the first N top ranking genes by S p , a ratio of 1:1 between (i) the number of input genes and (ii) the number of genes in network proximity to input genes. The resulting values were 0.21 for dnCNVn n0 and 0.19 for dnCNV3s n0 , RcnV n0 , dMUT n0 , mMUT n0 and SFARIh n0 . In the joint analysis of all gene lists ε was set equal to 1, because the analysis was mainly aimed at defining a network-based prioritization of the 956 input genes, rather than at predicting other genes in network proximity. In all these analyses we used K = 999.
Network resampling (NR) shows to which extent a network score, resulting from the combination of gene scores, is expected if links among genes are shuffled. Also in this case permutations are used to define the null model. Given a number m of genes at the top of a ranked gene list, NR consists of two steps. First, a non-decreasing quadratic objective function (m) is defined: where S p (m) is the vector referring to the first m-scoring genes and A m is the adjacency matrix between such genes. In the second step, q permutations of A m are defined keeping the same degree distribution. Lastly, an empirical p-value (p N ) is calculated to quantify the fraction of times the objective function calculated on a permuted network, k (m), is greater than or equal to (m). The procedure is repeated for different m, providing an overview on whether gene links and gene scores determine significant network scores when moving down in the ranked gene list (see figures below). Network resampling (NR) was applied to genes ranked by S p in descending order and using a total of 200 permutations, which was enough to underline the presence of significantly connected components (gene modules).

Pathway Analysis
Pathway analysis was carried out using over-representation analysis (ORA). ORA estimates the significance of a pathway in relation to an input gene list, calculating the hypergeometric probability of finding the observed number of input genes that are also members of the considered pathway, in the context of a background set of genes. As a background we considered all the genes occurring in original lists and all genes occurring in the gene network. Gene-pathway associations were downloaded from NCBI Biosystems (version: February 2017) (Geer et al., 2010); in particular, only pathways (gene sets) with a number of genes between 10 and 200 were considered. Hypergeometric probabilities were calculated using "phyper" and "dhyper" R functions, and were corrected for multiple hypotheses testing using the Benjamini-Hochberg method, implemented in "p.adjust" R routine. The similarity between two gene sets (A, B) was calculated using the overlap coefficient:

Network Location of Genes Associated with ASDs in SFARI Database
With the aim of characterizing the functional relations among SFARI genes and predict relevant risk genes for ASDs, we considered direct and indirect protein-protein interactions (PPI) and quantified, via the permutation-adjusted NSI (Sp) (Bersanelli et al., 2016), the network proximity of each human gene in relation to the network location of 154 genes reported as strongly associated with ASD (SFARIh n0 list, Table 1).
We found several genes in significant network proximity to SFARIh n0 genes, with high S and low p S (Figure 1A and Supplementary Table 2). Interestingly, among these genes, we found a significant number of genes having minimal/hypothetical evidences of association with autism in SFARI (SFARIl n0 ) ( Table 2).
In order to assess whether genes ranked by S p formed a significantly connected gene module, we applied the NR approach (Bersanelli et al., 2016). We observed a significantly connected gene module (M SFARI ) resulting from the top 244 genes (Figures 1B-D). This module includes 142 (out of 154) SFARIh n0 genes, 9 SFARIl n0 genes and 93 genes not in SFARI.
These 93 genes include regulators of synaptic development and plasticity, are involved in syndromic conditions in comorbidity with ASD, regulate epigenetic mechanisms and a few are associated to cancer (Supplementary Table 2). For example, among the 93 genes that have a relevant position within the module ( Figure 1C) we found cancer genes that control cell proliferation, [e.g., Tumor Protein P53 (TP53), AKT Serine/Threonine Kinase 1 (AKT1), Mechanistic Target Of Rapamycin (MTOR), C-Terminal Binding Protein 1 (CTBP1)], a process that was recently proposed as a common denominator of cancer and ASDs (Crawley et al., 2016), and genes with relevant role for brain function e.g., histone deacetylase-1 (HDAC1), histone deacetylase-3 HDAC3 (Volmar and Wahlestedt, 2015) and contactin-2 (CNTN2) (Anderson et al., 2012). These 93 Frontiers in Genetics | www.frontiersin.org predicted genes act as a bridge between SFARIh genes that were not directly linked in STRING ( Figure 1D).
Interestingly, while this manuscript was under review, 3 out of the 93 predicted genes were added to the SFARI database independently from our study. Namely, HADAC3 was added among "hypothesized" genes, while TANC2 and PPP2R1B among "minimal evidence" genes.

Genes in Network Proximity to Genes Harboring Variations Associated with ASD
In addition to the SFARIh gene list, we considered five other lists of genes found altered in ASD subjects by previous studies. These lists vary in size from 67 to 530 and from 45 to 263 genes after integration with STRING gene network. The percent overlap between lists is low ( Table 1) and most of the genes occur only in one list (Figure 2). In this situation, as introduced earlier, the information on how gene products interact to regulate biological functions can be used to explain the heterogeneity of ASD risk gene lists. Firstly, we analyzed each list separately to underline the specificities and commonalities of each list. Subsequently, we used network information to define a prioritization among all ASD risk genes proposed in the considered studies (union of all gene lists).
We calculated the Sp of all genes in STRING network considering as input each of the six ASD gene lists (SFARIh n0 , dnCNVn n0 , dnCNV3s n0 , rCNV n0 , dMUT n0 , and mMUT n0 ) and selected the top 2n genes ranked by decreasing values of Sp, where n is list size (Supplementary Table 3). Note that almost all these genes are in significant network proximity (p S < 0.05) to the corresponding input genes (Figure 3 and Supplementary  Table 3), which are also ranked among the top 2n genes. For convenience we will refer to these network-based gene lists-which contains input and predicted genes-as SFARIh n * , dnCNVn n * , dnCNV3s n * , rCNV n * , dMUT n * , and mMUT n * .
Only 14 genes occur in three or more input gene lists (Figure 2). Among those genes, at least DLGAP2 (Discs Large Homolog Associated Protein 2) and SYNGAP1 (Synaptic Ras GTPase Activating Protein 1) are worth mentioning. In fact, DLGAP2, a post-synaptic density protein with probable implication in ASD pathogenesis (Chien et al., 2013) scored as "minimal evidence" in SFARI (SFARIl), is part of all three CNV lists and was predicted as functionally related to SFARIh genes and dMUT genes. Furthermore, DLGAP1 (Discs Large Homolog Associated Protein-1) was predicted as functionally related to genes of 3 input lists, including SFARIh ( Figure 2B). Similarly, SYNGAP1, which codes an autism related brainspecific synaptic Ras GTP-activating protein (Berryer et al., 2013), occur in three input lists (dnCNVhc, dnCNV3s and SFARIh) and was predicted as functionally related to genes harboring rCNVs.
Globally, a total of 913 genes were predicted as functionally related to at least one ASD risk gene list (Table 3 and  Supplementary Table 4). Interestingly, 106 of these genes were already proposed as ASD risk genes in 1 or more   A total of 913 genes were predicted to be in network proximity to ASD risk genes of one or more studies (rows) where could appear as ASD risk genes (columns). For example, 15 genes were predicted in network proximity to ASD risk genes of 2 studies and were proposed as ASD risk genes in 1 study (n i = 1, np = 2); #: row or column sum.
studies: for example, CTNNB1 (Catenin-Beta1) encoding a protein part of the adherens junctions complex, NRXN1 (Neurexin1), NLGN4X (Neuroligin4X), encoding a pre-synaptic and post-synaptic protein, respectively, and the tumor suppressor PTEN (Phosphatase And Tensin Homolog), were predicted as functionally related to 2 gene lists and included as risk genes in other 2 lists. We have also found genes that were not included in any input gene list, but were predicted to be in relevant network proximity to multiple gene lists. For example, 27 were predicted as functionally related to three gene lists (Figure 4); among these, ADGRL2 (adhesion G protein-coupled receptor L2), LRTOM (leucine rich transmembrane and O-methyltransferase domain containing) and SRC (Proto-Oncogene Non-Receptor Tyrosine Kinase SRC) were predicted in functional relation to 5 lists.
Many of the 29 genes predicted as functionally related to three or more lists-27 genes not included in any input list and 2 included in one study-take part in many PPIs, implicating they are central in the PPI network (e.g., TP53, AKT1). The significance of their Sp suggests that these genes were not only selected in relation to their centrality, but also because their network distance to ASD risk genes is lower than expected by chance (Figure 3 and Supplementary Table 3). A further observation that supports this hypothesis is that these genes establish a number of interactions with ASD risk genes that is higher than expected (p < 0.05, hypergeometric test) ( Table 4). From a network point of view, these 29 genes are "surrounded" by 369 ASD risk genes (first order neighbors).
It is also worth mentioning that several genes resulting with the highest network proximity score to each list tend to be list specific (Figure 5).
A total of 956 unique ASD risk genes occur in the 6 input lists. We calculated the Sp of all genes relative to these 956 genes ( Figure 6A) and, by means of NR, found a significantly connected component of 561 genes (Figure 6B and Supplementary Table 5). This gene module (M ASD ) includes all SFARIh genes, 70% of genes occurring in dnCNV_n list, approximately 50% of the other gene lists (Figure 6C), 26 genes in SFARIl and 8 genes that do not belong to the input list. Among these 8 genes, we find the already mentioned AKT1, TP53, and SRC, which occupy a central role in the PPI network and were also predicted during the independent analysis of each input list (Figures 4, 6D). FIGURE 4 | Genes in network proximity to ASD risk genes from 3 or more studies. Permutation-adjusted network smoothing index (Sp) of the 29 genes predicted as functionally related to 3 or more ASD risk gene lists (the number is reported between parenthesis); the asterisk (*) indicates genes of the corresponding original list; faded colors indicate input ASD risk genes or genes with no significant Sp values.

Pathway Analysis of Gene Lists Associated with ASDs
We carried out an over-representation analysis to characterize original gene lists and network-based gene lists in terms of pathways. We observed significant pathways (at adjusted p < 0.01) in only two of the six original gene lists (SFARIh i and dnCNVn i ) and in five network-based lists (SFARIh n * , mMUT n * , rCNV n * , dnCNV3s n * , and dnCNVn n * ) (Supplementary Table 6). Overall, we obtained a much higher number of pathways in network-based lists than in original ones, despite the number of tested genes was similar between the former and the latter ones. Therefore, the observed enrichment in pathways can be mainly brought back to the network-based analysis, since, by definition, it prioritizes genes functionally related to those considered in input. Further, apart from a single exception in SFARIh, pathways found only in original lists were similar, in terms of gene content, to pathways found also in network-based lists ( Figures 7A-F). In other words, network-based analysis resulted in an enrichment at pathway level with a very limited loss of information. Interestingly, pathways found in original gene lists, and lost due to genes for which network-based analysis was not applicable, were recovered by network-based analysis (compare rows labeled with suffix "i", "n0", and "n * " in Figure 7F).
We summarized the significant pathways found in each lists in a unique pathway network, the so-called enrichment map . Specifically, we took into account up to 20 of the most significant pathways as representatives of each pathway cluster found in each list. This selection resulted in a total of 366 pathways, clustered in 11 groups: transmission across synapses (clusters 1 and 4), signal transduction (2), Rho GTPase and apoptosis (3), inositol phosphate metabolism (5), ER-associated degradation process (6), ion transport (7), chromatin remodeling (8), oxygen transport (9), proteoglycan biosynthesis (10) and Wnt signaling (11). While the majority Gene and function from GeneCards (Safran et al., 2003) and OMIM (Amberger et al., 2015); Band: cytogenetic band associated with genetic variations in SFARI (Abrahams et al., 2013); # Studies: number of studies supporting the genetic variations observed in the cytogenetic bands; I: interactors; A: ASD risk genes, |A| = 956; np: number of ASD risk gene sets the gene is network proximity to; I: interactors; p: probability that |A ∩ I| is greater than or equal to the observed value in a hypergeometric experiment; the total number of genes composing the network is 11,535.
FIGURE 5 | Genes in network proximity to ASD risk genes from each study. Top 10 genes with the highest permutation-adjusted network smoothing index (Sp) calculated in the analysis of each ASD risk gene list; the number of lists to which the gene was predicted as network proximal is reported between parenthesis; the asterisk (*) indicates genes of the corresponding original list; faded colors indicate input ASD risk genes or genes with no significant Sp values.
of pathways were found in more than 1 list ( Figure 7G, yellow circles), some pathway clusters were composed of pathways uniquely associated with one list. For instance, proteoglycan biosynthesis was specifically associated with rCNV, ion transport with mMUT, oxygen transport with dnCNV3s and dnCNVn.

DISCUSSION
Recently, the knowledge of molecular interactions has been used for the interpretation of genetic data on ASDs. In comparison to previous works, we analyzed multiple ASD risk gene lists proposed in large studies, for a total of approximately 1,000 genes. We observed a low overlap between ASD risk gene lists. Whether this heterogeneity reflects the biology of ASD or is the result of confounding factors, the analysis of network proximity between genes underlines the ASD risk genes that are also in functional relation and lead to the identification of modules of functionally related genes hit by genetic variations.
The main limitation of a network-based analysis such as ours is the availability of a priori annotations required for the definition of the genome-scale network. In this work we considered both direct (physical) and indirect (functional) high confidence PPI from STRING, which allowed us to analyze 11,535 human genes. Note that the use of direct and indirect STRING interactions showed good performances in prioritizing candidate disease genes (Köhler et al., 2008). Moreover, unlike other networkbased works on ASD genetic data, we used network diffusion to quantify network proximity between ASD risk genes and other genes. Network diffusion (a global approach) considers the whole network topology in its full complexity and, therefore, has better performances than local approaches (e.g., direct neighborhood or shortest path length; Wang et al., 2011). Lastly, we underlined ASD risk gene modules without constraining the search to topological communities. In fact, there is no guarantee that topological communities are able to capture disease modules (Ghiassian et al., 2015). Hence, we quantified the significance of the observed network proximity scores in comparison to random networks of the same degree distribution (Bersanelli et al., 2016).
Our work provides a network-based prioritization of human genes associated with ASD by previous studies. We extracted a module of 244 genes in network proximity to genes reported in SFARI as strongly associated with ASD. Interestingly, the module contains a significant number of genes proposed as possibly involved in ASD (categorized as "minimal evidence" and "hypothesized" in SFARI) and another 93 genes not scored in SFARI (Supplementary Table 2). While this manuscript was under review, 3 of these 93 genes were included in SFARI independently from our study.
From the 93 genes, 16 genes are involved in synaptogenesis and synaptic plasticity or transmission, and alterations in structure and function of neuronal synapses are well known causes of ASD. Among these, APLP2 also regulates proper progression of neuronal differentiation program during cortical development (Shariati et al., 2013), is involved in Alzheimer disease and interacts with CNTN in neurodevelopment and diseases (Osterfield et al., 2008). Then again, the 3 genes, CACNA2D1, CACNB1, CACNG1 induce the repression of the downstream regulatory element antagonist modulator (DREAM) and the expression of the neuropeptide dynorphin (DYN). DREAM plays a role in synaptic plasticity and behavioral memory (Wu et al., 2010), while DYN is involved in behavioral symptoms characteristic of human depressive disorders (Knoll and Carlezon, 2010). Also CACNG3, a calcium channel protein, regulates the function of AMPA-selective glutamate FIGURE 7 | Significant pathways enriched in genes associated with ASD. (A-E) Pathways found in the analysis of each original list (i) and corresponding predicted genes (n*). (F) Heatmap of all the pathways found in the analysis of the lists reported on the rows; also pathways found analyzing the original lists without genes not occurring in the STRING network (n 0 ) are represented; the color bar indicates the −log 10 (p) of the hypergeomtric p-value, adjusted for multiple hypothesis testing; the number 7 indicates p ≤ 10 −7 . (G) Enrichment map; vertex size is proportional to pathway significance (adjusted p-value); links are reported only for overlap coefficients >0.5; for each pathway, only the links with the top 5 most similar pathways are drawn. (A-G) See Supplementary Table 6. receptors and mediates synaptic transmission in CNS while FLRT3 takes part in a trans-synaptic complex (Lu et al., 2015). Eight genes are involved in neuronal differentiation, neurodevelopment and neuronal function. More specifically, AKT1 is a downstream mediator of the PI3K pathway that regulates synaptic formation and plasticity and which imbalance leads to autism and schizophrenia (Enriquez-Barreto and Morales, 2016); genetic variations in contactins (CNTN) have been described in association with neurodevelopmental disorders, including autism. Specifically, CNTN1 and CNTN2 are members of the presynaptic NRXN superfamily and 13 rare non-synonymous variants of CNTN2 have been found in ASDs patients while mice with Cntn5 mutations show an abnormal audiogenic response due to defects in the formation of synapse in auditory neurons (Cottrell et al., 2011;Chen et al., 2014).
Many of the 93 genes are involved in syndromic comorbidities, including auditory and visual senses deficit, epilepsy, mental retardation and psychiatric conditions that affect nearly three-quarters of children with ASD. For instance, CAMK2G and PDZD7 are involved in Usher Syndrome the most common condition leading to deafness and blindness, as well as DNMT1 has a role in DNMT1-Related Dementia, Deafness, and Sensory Neuropathy (Vernon and Rhodes, 2009) and LRTOMT in deafness. Again, MYL7 mutations are associated with Fechtner Syndrome which features include hearing loss and eye abnormalities. SP4 is involved in bipolar disorder and schizophrenia while ANK3, ACSL4, DLG3 are associated with mental retardation and, interestingly, recalling the high male prevalence of ASD, the latter two map on X-cromosome; NHS also maps on X-cromosome and mutations in this gene cause Nance-Horan Syndrome characterized by congenital cataract leading to vision loss; in males mild or moderate mental retardation may also occur and ASD have also been described in few patients (Toutain et al., 1997). Mutations in NAGLU and HGSNAT cause the Sanfilippo Syndrome (also called mucopolysaccharidosis Type III) often misdiagnosed with idiopathic developmental delay, attention deficit/hyperactivity disorder and/or ASD (Wijburg et al., 2013). QDPR mutations provoke hyperphenylalaninemia (Trujillano et al., 2014), (also called atypical phenylchetonuria (PKU), a genetic metabolic disease provoking postnatal cognitive deficit due to the neurotoxic effect of hyperphenylalaninemia; interestingly, PKU could be a comorbid condition of ASD, although with low prevalence (Baieli et al., 2003). MKRN3 is associated with Prader Willy Syndrome, NPAP1 both with Prader Willy Syndrome and Angelman Syndrome while DSCAML1 with Down Syndrome. These syndromes are characterized by mental retardation and can have co-occurring ASDs (Peters et al., 2004;Capone et al., 2005;Dykens et al., 2011). KMT2D and WDR5 defects are involved in Kabuki Syndrome characterized by multiple congenital abnormalities, from mild to severe developmental delay and intellectual disability. People suffering from this syndrome may also manifest seizures, hypotonia, strabism, hear infections, hearing loss and autism (Parisi et al., 2015). The very rare mutations in MANBA results in β-mannosidosis with a severe neurological disorder that can include mental retardation, cerebellar ataxia along with visual and hearing deficits (Sabourdy et al., 2009). CACNG3 is involved in Childhood Absence Epilepsy and is also associated with some cases of ASD (Danielsson et al., 2005) while mutations of KATNB1 cause complex cerebral malformations (Mishra-Gorur et al., 2014).
The remaining genes (among the 93) are mostly involved in epigenetics, cell cycle and cell adhesion and some of them are also implicated in tumor development as already reported by Crespi (2011) and Crawley et al. (2016).
The network-based analysis of genes from SFARI and other 5 previous studies resulted in the definition of a gene module that involves 561 ASD risk genes in significant functional relation. The module contains all the considered SFARI genes (strongly associated with ASD) and from 40% to 70% of genes from each of the other lists of ASD risk genes. Therefore, this module can be seen as a further screening of the genes proposed by such studies, which underlined those in significant functional relation from a network perspective.
More generally, the network-based scores that we calculated for every gene in the considered STRING network can be used to quantify the functional relation between any gene and ASD risk genes found in one or more previous studies.
Biological pathways enriched in genes in network proximity to ASD risk genes encompass several functions already proposed to be associated with ASD (see, for example, Pinto et al., 2010). Network-based analysis, through the prioritization of functionally related genes, enriched the number of significant pathways found by ORA in comparison to the analysis of original gene lists. Despite not all genes occurring in original lists underwent network-based analysis, the latter was not affected by a loss of information at pathway level.
The predicted genes in network proximity to ASD risk genes that have a central role in the PPI networks, but SRC, mapped in ASD risk loci. SFARI Gene database lists all the studies reporting CNV at the chromosome bands where predicted genes are localized (Table 4). In many reports, the CNV of interest was subsequently confirmed or validated by an independent method following its discovery. Additionally, from a functional point of view, most of the predicted genes are involved in epigenetics, cell cycle, growth-, proliferation-and differentiationsignaling and are often implicated in cancer development. This finding indicates pleotropic effects of some autism-associated genes on cancer risk and is supported by previous discussions that highlight a wide overlap in risk genes and pathways for cancer and autism (Crespi, 2011;Crawley et al., 2016). Advances in pharmacological therapies to ameliorate autism symptoms could be resulted from cancer drugs that target the same growthsignaling pathways (Crespi, 2011).

AUTHOR CONTRIBUTIONS
EM collected the ASD data from the literature, setup and run the analyses; MB curated the physical mathematical modeling of network diffusion; MG and MM managed the high performance computing infrastructure; GC supported the physical mathematical modeling; LM coordinated the research; AM analyzed the ASD literature, interpreted the biological results, coordinated the research. All authors discussed the results, contributed to manuscript writing and revision.