Moving from capstones toward cornerstones: successes and challenges in applying systems biology to identify mechanisms of autism spectrum disorders

The substantial progress in the last few years toward uncovering genetic causes and risk factors for autism spectrum disorders (ASDs) has opened new experimental avenues for identifying the underlying neurobiological mechanism of the condition. The bounty of genetic findings has led to a variety of data-driven exploratory analyses aimed at deriving new insights about the shared features of these genes. These approaches leverage data from a variety of different sources such as co-expression in transcriptomic studies, protein–protein interaction networks, gene ontologies (GOs) annotations, or multi-level combinations of all of these. Here, we review the recurrent themes emerging from these analyses and highlight some of the challenges going forward. Themes include findings that ASD associated genes discovered by a variety of methods have been shown to contain disproportionate amounts of neurite outgrowth/cytoskeletal, synaptic, and more recently Wnt-related and chromatin modifying genes. Expression studies have highlighted a disproportionate expression of ASD gene sets during mid fetal cortical development, particularly for rare variants, with multiple analyses highlighting the striatum and cortical projection and interneurons as well. While these explorations have highlighted potentially interesting relationships among these ASD-related genes, there are challenges in how to best transition these insights into empirically testable hypotheses. Nonetheless, defining shared molecular or cellular pathology downstream of the diverse genes associated with ASDs could provide the cornerstones needed to build toward broadly applicable therapeutic approaches.

The substantial progress in the last few years toward uncovering genetic causes and risk factors for autism spectrum disorders (ASDs) has opened new experimental avenues for identifying the underlying neurobiological mechanism of the condition. The bounty of genetic findings has led to a variety of data-driven exploratory analyses aimed at deriving new insights about the shared features of these genes. These approaches leverage data from a variety of different sources such as co-expression in transcriptomic studies, protein-protein interaction networks, gene ontologies (GOs) annotations, or multi-level combinations of all of these. Here, we review the recurrent themes emerging from these analyses and highlight some of the challenges going forward. Themes include findings that ASD associated genes discovered by a variety of methods have been shown to contain disproportionate amounts of neurite outgrowth/cytoskeletal, synaptic, and more recently Wnt-related and chromatin modifying genes. Expression studies have highlighted a disproportionate expression of ASD gene sets during mid fetal cortical development, particularly for rare variants, with multiple analyses highlighting the striatum and cortical projection and interneurons as well. While these explorations Introduction Autism spectrum disorder (ASD) is a pervasive developmental disorder, affecting around one of every 100 children. ASD is characterized by profound deficits in communication and social interaction as well as restricted interests and resistance to change. ASD clearly has a strong genetic component, with a 60-90% concordance between monozygotic twins. However, the disorder shows 1. A metabolic pathway might be a series of enzymes that progressively alter a metabolite (for example, the Krebb cycle). 2. A signaling pathway is a series of molecules, usually proteins, that transmit biological information, primarily using chemical modifications to activate or inhibit signaling activity of downstream targets. 3. A genetic pathway is a set of genes that contribute to a common final phenotype in a related manner, as determined by epistatic analyses. Note that additive effects on phenotype are not sufficient to place two genes into the same pathway. To be firmly placed in the same genetic pathway, gene products must be shown to be complementary, dominant, or suppressors of one another.
For the purposes of this review, metabolic and signaling pathways are referred to and treated simply as gene sets. The term 'pathway' will be used to refer exclusively to genetic pathways. Note that discovery of genetic pathways historically has led to the elucidation of a corresponding specific type of pathway (e.g., a signaling pathway such as Wnt signaling), though initial definition of a genetic pathway requires no knowledge of molecular function, only measurement of an effect on a phenotype. Circuit: Generally, a course along which chemical and electrical signals travel. While a cellular circuit has some analogy to a 'molecular circuit' or signaling pathway, here we are distinguishing between these two levels of analysis. For the purposes of this review, circuits only refer to series of interconnected neural cells that mediate a particular behavior.
remarkable heterogeneity in the genetic risk factors. Common variant analyses have identified few reproducible associations across studies, and meta-analyses suggest that what common variants do exist likely have small individual effects (odds ratios less than 1.2) and act in a highly polygenic manner (Anney et al., 2012;Klei et al., 2012;Gaugler et al., 2014). Thus, the recent focus has been on rare variants, including copy number variations (CNVs), and exome sequence analyses (Pinto et al., 2010;Sanders et al., 2011Sanders et al., , 2012Chahrour et al., 2012;Malhotra and Sebat, 2012;Neale et al., 2012;O'Roak et al., 2012b;Yu et al., 2013;De Rubeis et al., 2014;Iossifov et al., 2014). These studies collectively have identified a clear role for rare and private deleterious coding mutations, both de novo and inherited. However, though of larger effect size, the rarity of these individual events limits statistical power. For example, while de novo lossof-function mutations may collectively account for around 10% of ASD cases, any given gene might be seen to be mutated only in 2 or 3 cases out of the thousands now sequenced (Sanders et al., 2011;De Rubeis et al., 2014). Nonetheless, since 2012 a number of de novo, apparent loss-of-function mutations have been described that are found primarily in individuals with ASD, and a growing number of the same genes have been mutated frequently enough to indicate clear association. Ongoing efforts are poised to discover many more. Current estimates indicate there will be several hundred genes implicated by this approach when sufficient sample size is obtained , in addition to the >100 genetic syndromes which already show some shared genetics or comorbidity with ASD (Betancur, 2011;Yu et al., 2013). With the number of new ASD variants being discovered the research bottleneck now is the identification of the neurobiological mechanisms by which they act. Since the genetic heterogeneity is so substantial, it is hoped that the identification of common neurobiological mechanism(s) across these diverse genetic causes may suggest some common routes to treatments.
The relatively recent advent of computational science has produced tools that enable opportunities to unveil truths that are not reachable using only theoretical or experimental approaches alone (Reed et al., 2005). Consequently, many recent scientific advancements have materialized thanks to two alternating and complementary modes of reasoning (Kell and Oliver, 2004). Discovery-driven approaches focus on inductive reasoning; they examine wide sources of data and attempt to define hypotheses from the emergent patterns that describe cause and effect relationships. In contrast, hypothesisdriven approaches leverage deductive reasoning to identify the logical consequences of a specific theory or hypothesis; consequences that can then be tested in an experimentally rigorous manner. The dawn of the genomic era, with the ability to measure the expression of thousands of genes, proteinprotein interactions, epigenetic marks, etc., has produced fertile grounds for discovery-driven analyses, and many groups are leveraging these data resources in joint analyses with human genetics data for ASD to provide novel insights into any shared characteristics of the genes and potential mechanisms of this disorder. Here, we review these studies with a particular focus on what bioinformatic approaches may have indicated about the molecular or cellular mechanisms of ASD. Then, we also highlight some of the successes and the challenges facing these approaches, along with a limited number of recommendations toward possible solutions. The overall aim of this review is to spur robust, critical, and creative thinking to advance the field.

Evolution of Discovery-Driven Applications for ASD-Related Genes
Studies of ASD genetics have evolved substantially over the last 15 years. As it was realized that common variants of large effects would be truly rare, it became evident that large sample sizes would be necessary to power both common and rare variant analyses. To amass these samples, large gene discovery projects required the coordinated efforts of hundreds of researchers with specialized expertise (clinicians, biologists, statisticians, programmers, etc.). The end results of these studies were essentially tables: tables of SNPs showing tentative association, linkage, or transmission disequilibrium Wang et al., 2009;Weiss et al., 2009), or tables of CNVs (Sebat et al., 2007;Marshall et al., 2008;Bucan et al., 2009;Glessner et al., 2009;Pinto et al., 2010;Levy et al., 2011;Sanders et al., 2012), or de novo and recessive single nucleotide variants (SNVs; Gilman et al., 2011;Chahrour et al., 2012;O'Roak et al., 2012b;Sanders et al., 2012;Yu et al., 2013;De Rubeis et al., 2014;Iossifov et al., 2014) occurring, with some statistical confidence, in individuals with ASD and other forms of developmental delay. These tables, collectively, have provided the foundational resource to begin understanding the human biology of ASD.
The results in these tables are arguably significant enough that a study is complete when they are generated. But they are difficult to reduce to a single statement for a title, or to summarize in an abstract, and perhaps aesthetically unpleasing as a final figure. Thus, the emergence of a 'capstone analysis.' Early on, if only a single candidate region or two arose from a study, such an analysis might be as assessing association between a SNP and gene expression (e.g., CDH9) or between cases and controls for gene expression (e.g., SEMA5A), which were the capstone figures of two early common variant GWAS studies Weiss et al., 2009). But as the tables became longer, the capstone analysis was often focused on summarizing the likely candidate genes on the table as a whole, i.e., to provide a systematic gestalt of these genes. Examples included leveraging the GOs resource to identify disproportionately represented categorical terms [e.g., Cytoskeletal elements or Rho GTPases (Pinto et al., 2010), known to regulate neurite outgrowth (Hall and Lalli, 2010)], or an attempt to organize all the resulting genes into some kind of network (Box 1) using other data resources. In more recent years, these capstones have expanded in scope and in effort (Gai et al., 2012;De Rubeis et al., 2014;Pinto et al., 2014), sometimes sufficiently to become companion and post hoc analytical manuscripts focused on finding common themes to the discovered genes, and presumably the disorder (Gilman et al., 2011;Ben-David and Shifman, 2012;Parikshak et al., 2013;Willsey et al., 2013;Krumm et al., 2014;Xu et al., 2014;Chang et al., 2015;Hormozdiari et al., 2015). In a review by Willsey, the earlier works have been characterized as initially using 'static' data resources to contextualize the findings, but eventually turning to more 'dynamic' resources such as gene expression across brain regions or cell types in the CNS (Willsey and State, 2015). As gene expression inherently includes an aspect of brain region and developmental time, they could be equally described as moving from trying to find a shared molecular pathology for these genes, to trying to find a shared regional or cellular pathology. Below, we review capstones from both types of analyses and highlight recurrent themes that may be emerging across groups.
A Shared Molecular Pathology for ASD-Related Genes?
Given a set of genes, a variety of mature tools exists for identifying disproportionately shared molecular functions for these genes, mostly based on researcher-curated collections of gene functions (e.g., GO), or empirically determined sets of protein-protein interactions, derived from literature mining or high throughput screens in simplified model systems (e.g., yeast 2-hybrid; Ashburner et al., 2000;Lage et al., 2007;Rossin et al., 2011;Szklarczyk et al., 2015). These approaches have highlighted a variety of enriched molecular functions amongst ASD related gene sets (Table 1). However, the utility of the results from these approaches have two limitations; they are dependent on manually curated annotations, and they do not lead directly to falsifiable hypotheses.
First, while these GO-based tools are indescribably preferable to the alternative (attempting to manually curate the literature for dozens or 100s of genes simultaneously), it is clear that they also suffer from a derivative of one of the classic barriers to unadulterated inductive reasoning described by Francis Bacona sort of collective version of his 'idols of the cave.' The term classically refers to how an individual's interpretations of data are colored by their prior knowledge and experiences (Bacon, 1620). Likewise, GO terms are assigned based on the collective experiences of researchers, as reflected in the literature, and thus they can only be readily leveraged for well-annotated genes. In addition, even known genes may have unidentified pleiotropic molecular functions. For example, FMRP, the RNA binding protein disrupted in Fragile X syndrome, has recently been shown to also physically regulate presynaptic voltage gate potassium channels through protein-protein interactions (Deng et al., 2013), independent of any RNA binding activity. It may likewise be found that genes currently annotated as chromatin modifiers (e.g., CHD8) or histone deacetylase (e.g., HDAC5), may have as yet unknown roles in directly modifying cytoskeletal elements regulating neurite morphogenesis. Simply put: analyses based on curated knowledge cannot account for currently unknown functions.
Second, it is not always clear how the insights from these molecular gene set analyses might be actionable for identifying mechanistic hypotheses for ASD or developing new therapeutics. Results such as an enrichment of genes in the GO gene set 0045216 (intercellular junction assembly and maintenance), which contains 159 genes, provides limited insight on which direction to pursue. In addition, the specificity of the 159 genes in the entire GO gene set to ASD or a particular question (e.g., drug targets, causative genes, and temporal expression of ASD genes) is unknown. It has been long shown in model systems that genes that perform functions in the same genetic pathway or encode for proteins in the same protein complexes lead to similar phenotypes when disrupted, but it is not clear how closely linked a particular genetic pathway (Box 1) is with a given GO gene set. Thus, it would be ambitious to assume that disrupting any of the 159 genes associated with this GO gene set will lead to ASD. This is also because the 159 genes could be expressed in markedly different locations in the brain and the behavioral manifestations of such molecular disruptions will be highly dependent on the specific neural circuits that utilize each of these proteins.
In contrast, there are clear successes -that have led to purposeful experiments and meaningful treatments -arising from identifying the relevant neural circuit for a particular disorder. Note the rich variety of treatments arising from the knowledge that Parkinsonism is due to loss of dopaminergic cells of the Substantia Nigra. Long before any genes were identified that contributed to the development of this disorder, knowledge of the afflicted circuit (Box 1) led to the identification of viable treatment strategies. If the dysfunction of particular circuits in the brain manifests as explicit behavioral abnormalities (e.g., specific symptoms), then it is reasonable to assume that the shared symptomatology across distinct genetic causes of ASD implies some convergent neural circuit disruption downstream of these distinct genetic pathways. Encouragingly, if the diverse set of rare causative genetic mutations in ASD does share a common cellular or circuit mechanism, then we do not need to devise treatments for each specific rare mutation. Rather, treatments focused on correcting the common cellular dysfunction could be applied to individuals who have a variety of underlying causes, analogous to the common treatments used regardless of which genetic factor or environmental exposure was the underlying cause for a case of Parkinson's disease. Thus, identifying common cellular circuits mediating the behavioral disruptions seen across a variety of distinct ASD genetic etiologies is essential for designing practical treatments for this disorder.

A Shared Cellular Pathology for ASD-Related Genes?
To address the two limitations outlined above and to attempt to identify some shared neurobiological circuit disrupted across distinct genetic causes, we and others have focused on complementary analyses leveraging gene expression data resources. As gene expression is readily measured even for unannotated genes, it is unbiased and does not suffer from the 'idols of the cave.' And, as gene expression varies substantially across cell types or circuits, it may be possible to implicate particular circuits by expression alone. At an extreme, a disease gene selectively expressed in a single cell type in the brain (e.g., the narcolepsy-related peptide Hypocretin found only in a population of cells in the hypothalamus; Peyron et al., 2000), clearly implicates that cell type as a vulnerable population in the disorder and any related circuits as targets for treatment. While such all-or-none expression of genes in a single cell type is rare, the logic of this 'selective expression' hypothesis may be somewhat extensible to a more moderate statistical enrichment of expression as well: disproportionately enriched expression of a large number of disease genes in a particular cell type or tissue could indicate a relevant anatomical intermediary of a disorder. Indeed, we have now shown that retinopathy-causing genes are disproportionately expressed in rods and cones (Xu et al., 2014). Likewise, SNPs associated with autoimmune diseases by GWAS tend to be eQTLs for genes expressed in the blood where immune cells are prevalent (Ardlie et al., 2015). And knowledge of anatomical intermediaries leads to testable hypotheses: individual cell types can be disrupted in model organisms quite readily using Cre/Lox, optogenetics, and related approaches, and behavioral consequences examined.
Before summarizing the results of the analyses leveraging expression data, it is worth noting that while gene expression data have the advantage that they are relatively unbiased for specific genes, several caveats remain. First, determination of expression levels can be affected by variations in sample collection and preparation, technician experience, equipment calibration, and choices of pre-processing algorithms, statistical tests, thresholds, microarray/RNAseq platforms, and other aspects of study design (Gudjonsson et al., 2010;Suárez-Fariñas et al., 2010). However, stringent consistency throughout the study and prudent design choices can help to ensure reasonable accuracy with regard to relative differences between expression levels, and these relative differences are adequate for most subsequent analyses. Second, covariates such as differences in gender, age, cause of death, time to preservation of the sample, and batch effects are sources of potential bias that are typically corrected using standard methods, such as ANCOVA (Huitema, 2005). While commonly overlooked, in order to ensure spurious relationships do not slip past these corrections, it is important that covariate information is double-checked following subsequent analyses. For example, if a co-expression module of a couple dozen genes is identified, the individuals bearing most, or all, of the expression pattern should be extracted and the degree of correlations with covariates should be determined. Third, variations in ancestry or overlooked sample relatedness can present unexpected sources of bias. An effective, albeit not always practical, way to identify either of these potential pitfalls is to collect genotype data and analyze them using packages such as Structure (Pritchard et al., 2000) and PLINK (Purcell et al., 2007). If this additional data collection is impractical, thorough screening of study participants can help alleviate these possible sources of bias. Finally, inadequate sample size can lead to serious issues as described later in this review.
If these issues are addressed, then two approaches can be taken to leverage gene expression data. The approach we took in our particular analyses were 'top down.' We defined sets of genes with enriched expression in different tissues based upon available body-wide RNAseq data resources (GTEX: Ardlie et al., 2015), in different cell types based on cell specific profiling technologies from mouse data (bacTRAP: Doyle et al., 2008), and profiles of human brain regions across development (Brainspan: Kang et al., 2011). We then examined the overlap of these lists with candidate disease genes, in a manner very analogous to the tools overlapping candidate gene sets with GO. However, there are weaknesses to this approach. Our use of mouse data assumes conservation of gene expression in particular cell types across mammals -a reasonable, but clearly not perfect assumption (Zeng et al., 2012). And our approach also does not explicitly leverage the correlation structure of gene expression across tissues. Likewise, human brain-region and tissue-wide data sets lose the cellular level resolution that may be most useful for identifying targets for treatments. Both data resources are limited of course to the samples that were collected, and other cell types, tissues, or perhaps key developmental windows might be absent from a particular analysis. Human data in particular have focused heavily on cortex, potentially underrepresenting other regions that may be of importance (e.g., hypothalamus or brainstem). Thus, these analyses are moving toward being potentially usable as cornerstones for developing hypotheses of the cellular mechanisms underlying ASD, and will hopefully provide additional insights as more data become available.
A complementary set of 'bottom up' data-driven studies address some of these concerns. Several groups used a variety of clustering analyses to first organize the ASD related genes into networks, often leveraging their correlated expression across human brain development to group them into co-expression modules using WGCNA (Ben-David and Shifman, 2012; Parikshak et al., 2013), or philosophically similar approaches using additional data resources (Willsey et al., 2013;Chang et al., 2015). Resulting modules can be used for GO analyses or examined for enriched expression in particular developmental windows, brain regions, or cell types. It is worth noting here that it has long been recognized that one of the primary drivers of correlated gene expression across different brain regions is the consistent changes in proportions of different cell types (e.g., neurons and glia) across regions (Geschwind, 2000). Thus it is likely that many co-expression modules might correspond to genes enriched in a particular cell type. Our cell-type specific expression analysis (CSEA) approach Xu et al., 2014) or other datasets Zhang et al., 2014) can be used to rapidly identify this. Regardless, in the above analyses, either co-expression or somewhat more inclusive human genetics criteria has been used to expand these ASD-related gene sets into larger modules. This allows for more genes to be included in these analyses, facilitating better network insights, though it is currently unclear if there is a particular cost in terms of a potentially inflated false positive rate associated with this expansion of gene sets.
However, in spite of the moderate differences in the precise ASD-related gene sets, differences in leveraged data resources, differences in the use of 'top down' or 'bottom up' methods and statistical approaches, some themes seem to be emerging regarding where ASD-related genes show enriched expression ( Table 2). First, amongst the rare mutations that were highlighted in the recent exome studies, several groups have reported disproportionate expression in the mid fetal developing cortex and/or striatum (Parikshak et al., 2013;Willsey et al., 2013;Xu et al., 2014). Though there is some disagreement on the exact lamina that might be implicated (frankly, relatively few gene expression differences define distinct cortical lamina Dougherty et al., 2010;Xu et al., 2014) relative to the robust expression differences between cell types in other brain regions such as the cerebellum), many of these genes show relatively high expression in forebrain development. This is consistent with the long known roles in telencephalic development for at least two of the recently implicated genes (TBR1 and RELN; Caviness and Sidman, 1973;Hevner et al., 2001), and suggest that mutations profoundly affecting forebrain development may have ASD as one (of perhaps many) deleterious consequences. This is consistent with the replicated finding that individuals with de novo loss-of-function mutations have lower IQ than other individuals with ASD . Second, genes downregulated in human ASD postmortem transcriptomic studies Gupta et al., 2014), and ASD candidate genes compiled prior to exome studies ) seem to map most strongly to cortical interneurons, as well as a striatal cell type: medium spiny neurons (Xu et al., 2014). These findings suggest that perhaps there might be some shared abnormalities in cortical and striatal circuits across distinct genetic causes of ASD. In contrast, for example, none of the analyses have implicated cell types of the cerebellum, suggesting these are perhaps less commonly involved in ASD.
While overall both 'top down' and 'bottom up' discoverydriven approaches have highlighted potential circuits of interest in ASD, it is also clear that this disorder is not like retinopathies, which have a dominant signal in one or two cell types (minimum p-values < 10e −20 ). The significant, yet relatively modest statistical signal in ASD studies (minimum p-values around 10e −3 for medium spiny neurons or cortical interneurons) indicate there may be substantial heterogeneity in cellular mechanisms for the disorder, just as there is extensive heterogeneity in genetic mechanisms. Further, as many of these methods start from largely similar ASD-related gene lists, and leverage a small number of overlapping data resources, they do not truly represent independent replications. Thus, in the final section, we outline some of the challenges facing application of these approaches to ASD and present some examples of solutions and recommendations. The recommendations are not exhaustive and it is likely other elegant solutions exist as well. The challenges can be organized into three groups. First, how do we best identify and rule out alternative explanations that may also account for the relationships between these genes? How do we define the null hypothesis? And what are likely sources of false negatives? Second, how do we assess the reproducibility of a discovery-driven network analysis result? What constitutes a replication of one of these findings? Finally, how do we convert discovery-driven network-based insights into empirically testable hypotheses, and from there into informed treatments?
Challenges Posed by Systems Biology Approaches using ASD-Related Genes

Challenge 1: Selecting the Correct Interpretation of a Network Analysis Result
Networks are graphical descriptions of the relationships between the embedded entities. They provide the ability to display more numerous relationships than could be efficiently conveyed with  Peak expression during fetal brain development Peak expression during postnatal brain development

Module1 Module2
Red columns: primary computational approach used. words. However, a mind presented with such a large amount of data will rapidly organize it by drawing on examples from our own experience as researchers (idols of the cave yet again). Cortical development researchers might tend to migrate toward the cytoskeletal and Rho-GTPase genes, while physiologists may be most stimulated by the channel genes. One who has worked for many years with the transcriptional profiles of different cell classes in the brain, when looking at a network (or gene set), might have a bias to interpret it in terms of the cell types these genes are expressed in (i.e., one could view an 'immune' module in transcriptomic data as reflecting changes in the proportion of microglia in the tissue, rather than immune genes being upregulated in neurons). Thus, all investigators must be careful to recognize their individual biases for what they are and shield analytical approaches as best possible from them. In addition, there can be biases in the discovery methods and resources themselves that might create statistically significant results for scientifically insignificant reasons (Figures 1 and 2). Therefore, we also need to define as carefully as possible our null hypotheses and be attentive to circularity and alternative explanations.

Recommendation 1: define the null and rule out alternative explanations
Not all genes are equally likely to be implicated in genetic studies. A simple example is that longer genes will tend to, by nature of their size, overlap with more markers present on SNP microarrays, provide more bases that could have a de novo SNV and are more likely to be disrupted by a random CNV mutation. And of course, mutations are not randomly distributed: different regions of the genome, or even particular nucleotide contexts, have different rates of mutation (Krawczak et al., 1998;Michaelson et al., 2012;Samocha et al., 2014). Furthermore, transcript length, and potentially gene body size, Random GWAS results show enrichment in a network of brain-related GO terms: using a uniform distribution between 0 and 1, random p-values were assigned to SNPs in the genome, and SNPs were mapped to genes using ANNOVAR. The SNP with the lowest p-value in a gene was used to determine the 500 most significant genes that were then used for a GO analysis and displayed as a network using BINGO. Dozens of categories related to CNS function were significant (examples shown in table at bottom).
Frontiers in Genetics | www.frontiersin.org 05 Power to detect differential expression in RNAseq is influenced by total count number for a particular gene, and thus longer or higher expressed transcripts in a tissue are more likely to be found as significantly different (Bullard et al., 2010;Young et al., 2010). To determine if this could create spurious GO results from a brain transcriptomic experiment, we randomly sampled genes from the 10% most robustly detected genes in a brain RNAseq experiment (Ouwenga and Dougherty, 2015) to mimic results a 100 hypothetical differential expression experiments. If it was assumed these genes were randomly drawn without bias from the whole genome (n = 46030 genes) this consistently resulted in a statistically significant, but scientifically meaningless, enrichment in the GO category 0007268 (Synaptic Transmission). More conservative estimates which only include genes at least lowly expressed in the brain (CPM > 0.3 or CPM > 1) still frequently yield spurious overlap (blue, purple). Because of this, using an effective genome or 'background gene set' based on the transcripts which are well-powered for differential expression is recommended.
bear some relationship to biological function. Notably, genes expressed in the nervous system tend to be longer in both regards ( Figure 1A). For example, randomly sampling SNPs from the genome and mapping them to overlapping genes will result in an enrichment for brain-related GO categories ( Figure 1B). Likewise in transcriptomic studies, the appropriate background 'genome' needs to be carefully defined (Figure 2). Transcripts clustering in modules or being differentially regulated in a particular tissue, by necessity must first be expressed in that tissue. Thus, the effective genome and genome size for statistical analysis of overlap should be restricted to those genes whose transcripts could have plausibly been identified in the analysis. As an example, both length and expression can come into play when considering overlaps with the known Fmrp-interacting RNAs , as for potentially methodological reasons these tend to include long transcripts that are highly expressed in the brain. Therefore, the overlap of these RNAs with gene sets derived from either human genetic studies or potentially transcriptomic studies may reflect these primary features of the transcripts rather than a central role for Fmrp in the particular experiment (Ouwenga and Dougherty, 2015). Overall, correcting for gene body length, transcript length, and brain expression level are challenging. For example, simply down-weighting GWAS results for those genes that are tagged by more SNPs, under the assumption that every gene in the genome is equally likely to contribute to disease, would be too conservative -long genes could legitimately be more vulnerable to mutation/polymorphism because of their length. And, an evolutionary argument could be made that genes requiring more careful regulation have evolved to be longer -permitting the presence of more potential regulatory sites (e.g., enhancers in the genome, or protein binding motifs in the RNA) to finely tune final protein levels. Indeed, genes that do not appear to tolerate heterozygous mutations in humans , tend to be longer than the average gene. Thus, there is a risk that fully removing the influence of gene or transcript length in some analyses might be too conservative. Nonetheless, these are issues that should be explicitly addressed in analyses and chosen parameters should either be well-justified, or systematically varied to demonstrate robustness.
Therefore, an appropriate null for discovery-driven analyses of ASD-related genes should take these primary sequence features into account. A common approach is to conduct comparison analyses using sampled control sets of genes that share length, connectivity, or mutability with the ASD related genes (Willsey et al., 2013;Krumm et al., 2014;Chang et al., 2015). An additional control commonly used are genes actually detected as mutated (SNVs or CNVs) from control populations such as unaffected siblings, population databases of variation, or an unrelated disease (Pinto et al., 2010(Pinto et al., , 2014Gilman et al., 2011;Chahrour et al., 2012;Parikshak et al., 2013;Krumm et al., 2014;Samocha et al., 2014;Chang et al., 2015;Hormozdiari et al., 2015). This controls for both the known biases highlighted above and any currently unrecognized biases in the ASD gene discovery methods.

Challenge 2: Independent Replication of a Network Analysis Result
One of the tenants of the scientific method is reproducibility. Experiments should be able to be reproduced by other labs and result in substantially identical findings. Furthermore, following the deductive tradition, tests of the same hypothesis using different methods should produce convergent results if the model is correct and the methods are robust. While discoverydriven analyses are typically insight-or hypothesis-generating rather than hypothesis-testing endeavors, reproducibility and replication are criteria that are still applicable.

Recommendation 2: parameter choices and code sharing
Just as in biological studies, where minor changes in the composition of a buffer can sometimes substantially alter biochemical findings, minor alterations in parameters in bioinformatic analyses can result in substantial differences in the results. For example, the choice of the effective genome size can dramatically influence p-values in analyses overlapping two gene sets with a Fisher's Exact Test (Figure 2), and parameter choices in aligners have at times created misleading results such as an overestimation of RNA editing (Schrider et al., 2011). Frequently there may not be a strong a priori reason for choosing a particular parameter setting, which might lead to a 'parameter placebo:' an accidental or subconscious tuning of the parameter to produce the most striking results. To avoid this, key parameters can be varied systemically with the results presented in such a way that allows the reader to judge for themselves the robustness. For example, we were interested in overlapping sets of genes with 'enriched' expression in a particular cell type with ASDrelated gene sets. We could rank genes from most enriched to least, but justifying a precise threshold was challenging. Uniquely expressed in these cells? In them, but in a few other populations as well (i.e., moderately enriched)? As there was no clear answer, we designed the analysis to systematically vary the parameter and present the results at multiple thresholds, with the most intuitive confidence given to overlaps that occurred significantly across some or all thresholds (Xu et al., 2014).
Thus the researcher's choice for how parameters are set (or the range of values tested) needs to be well-justified. Code for conducting the entire analysis should be made available on request, or perhaps even hosted in its entirety on a public forum. However, to enable this, the discovery of a bug in code that has been made available should be treated as an opportunity to raise the quality of the scientific analyses as a whole, rather than as an opportunity to cast stones at a competing lab.

Recommendation 3: replication replication replication. . .
It is a common experience at the bench -the first replicate of an experimental series that matches predictions perfectly or that hints at exciting new biology. And then the second replicate that does not match the first, and then third, fourth, fifth, until it becomes apparent that the first experiment was the outlier, whether due to some technical mishap or simple winner's curse. At the bench one has the (dubious) luxury of being able to repeat an experiment as many times as cost and time constraints allow to convince ourselves of the reproducibility of an outcome. However, in systems biology, there is often only one starting candidate disease gene set with which to seed your network. And largely only one GO or Brainspan resource to compare it with. One can rerun the analysis to make sure the same result occurs (analogous to a 'technical replicate' at the bench), but this is not as reassuring as a true independent biological replicate would be. In general, replications of transcriptome analyses in independent samples have been difficult historically, and these discrepancies have been attributed to variations in study design, processing of samples, and/or computational methods (Gudjonsson et al., 2010;Suárez-Fariñas et al., 2010). Thus, assessing the reproducibility of a bioinformatic analysis is inherently challenging.
A fundamental question that must be faced is whether the inability to reproduce is due to systemic variations, such as those previously suggested, or due to failure to capture true biological signatures. A major obstacle for these studies is the difficulty in amassing large sample sizes and unfortunately, this issue is seldom addressed. Network construction is typically achieved by conducting some type of similarity or correlation tests across pairs of genes/proteins. Inadequate sample size can produce seemingly promising networks with strong community structure due to the clustering of false-positive correlations. Unfortunately, significant correlation thresholds are not onesize-fits-all and vary between datasets due to sample size, heterogeneity of samples and other factors. For this reason, we strongly recommend the use of a rigorous method for determining an appropriate threshold for edge placement during the construction of networks. For example, permutation trials provide a simple and robust method for determining appropriate correlation thresholds. For each trial, the data values for each gene are permuted across individuals, thereby retaining all of the properties of each gene except for inherent correlations with other genes. After running an adequate number of trials, e.g., 1000 trials, the highest correlation values computed across the uncorrelated permuted data can be used to determine a threshold with a desired p-value.
Assuming bona fide network construction, there are at least three, albeit imperfect, options for replication. First, borrowing from machine learning or human genetics studies, the starting ASD-related gene sets could be broken into artificial 'discovery' and 'replication' subsets, or even K subsets, so some form of K-fold cross-validation of the results could be conducted (assuming adequate sample size). Then at least the robustness of the results with regards to sample selection could be assessed (Refaeilzadeh et al., 2009). Second, comparisons of results with those identified by independent groups using distinct analytical approaches may yield strong evidence of biological validity. To an extent, it is very reassuring that multiple groups have drawn fairly similar conclusions when applying these approaches to ASD (Tables 1 and 2), though of course these are not true replications because they are not independent -as they draw on similar comparison data resources (i.e., regardless of whether GO is accessed through DAVID, Panther, BinGO, GOrilla or other portal, the gene sets are largely identical). Further, the Brainspan, bacTRAP, and Cahoy datasets have seen similar widespread use Doyle et al., 2008;Kang et al., 2011). Thus, it will be even more reassuring if similar results about these ASDrelated gene sets hold when additional comparison data resources come online, such as single cell expression studies (e.g., Zeisel et al., 2015). The third option is replication through the increasing size of the ASD-related gene sets through time. In this regard it is reassuring that many of the patterns seen in the early Capstone analyses (e.g., neurite morphogenesis/cytoskeletal elements) have been reproduced in later discovered gene sets ( Table 1).

Challenge 3: Converting Discovery-Driven Insights into Empirically Testable Hypotheses
How does one test a network result functionally? The recent application of a variety of these inductive discovery-driven approaches to ASD-related genes have highlighted potential molecular gene sets or cell types common to different genetic causes of ASD (Tables 1 and 2). As gene discovery and post hoc analyses are expected to continue apace, a key challenge is the conversion of these insights into clearly stated hypotheses from which we can deductively define a set of empirically testable predictions. This is not a straightforward endeavor -as a key facet of any good hypothesis is that it be falsifiable, and it is not clear that is the case with the insights emerging from capstone analyses. Assuming there are no artifacts in the analysis, how does one falsify the hypothesis that chromatin modifiers are important in ASD? If, following the discovery of many more ASD risk genes in the next rounds of sequencing there is no longer a significant enrichment of this class of genes, does that mean the chromatin modifiers are now unimportant? One could argue no. Because for that small subset of ASD cases who carry a mutation in a chromatin modifier like Chd8, chromatin modifiers still play an important role. Rather, it would argue that chromatin modifying genes play a role, but only in rare cases. Thus, the implications of the insights garnered from a properly conducted discoverydriven analysis can change in scale, but never really go away. Only if the assumptions of the analysis itself change (e.g., Chd8 turns out not to be a chromatin modifier) can the insight be falsified.
Yet, sometimes discovery-driven analyses can lead to the generation of falsifiable hypothesis. In a simple example, one could mutate the chromatin modifying function of Chd8 or other candidates and measure whether that phenocopies complete loss of function in a model system. Other predictions can also be made about increased ASD risk or shared biological functions for genes that share many edges in a network. Below we highlight some experiments and suggest others that might meet this challenge.

Recommendation 4: testing network predictions with human genetics
The networks described in several of the recent analyses cited above by design both included genes confidently associated with ASD, and included genes that were either less confidently associated from human genetics, or were implicated by 'guilt-byassociation' (Quackenbush, 2003) in post hoc analyses: e.g., that were perhaps co-expressed, co-annotated (GO), co-published (Text mining), or co-immunoprecipitated (PPI) with the more confidently associated genes. One prediction of these networks might be that mutations or polymorphism in these guilt-byassociation genes will also cause or contribute to ASD risk. This concept can be tested informatically (e.g., looking for an increased common variant risk near such genes, though controlling appropriately for gene length, etc.; Ben-David and Shifman, 2012;Parikshak et al., 2013). More directly, O'Roak et al. (2012a), tested their network's prediction by targeted resequencing of such genes in a large cohort of ASD patients, demonstrating novel statistical association for several of them, and showing the utility of the initial network analyses. Thus, these discovery-driven analyses can successfully serve to direct new studies in human genetics and additional studies may assist in more rapid identification of additional causative genes. This approach will continue to be useful for a few years, though eventually it is likely to be supplanted as sequencing costs decrease and targeted analyses are replaced by routine exome or whole genome sequencing. Likewise, inclusion in a particular gene set has been used to reweight the probability that a variant of unknown significance should be considered pathogenic (e.g., with the TADA algorithm De Rubeis et al., 2014), though the most compelling evidence continues to be the presence of recurrent mutations in cases of ASD.

Recommendation 5: phenotypic clustering in man and models
Another apparent feature of these analyses is that genes in the networks are somewhat clustered by function. Thus, these networks may be making testable predictions regarding the shared function of closely connected genes. If the genes do indeed share some function at the molecular or cellular level, then the prediction is that genes that are closer in the network will be closer in their consequences in cell models, animal models, or potentially even patient symptoms.
A point of caution is that failing to identify a significant similar phenotype or any phenotype at all when investigating a set of genes highlighted by a network analysis does not necessarily reject that hypothesis that the genes share some function. There are a myriad of possible phenotypes to evaluate, and all cannot be exhaustively tested. However, considering the information used to create the network and the resulting gene set can provide limits to the scope of phenotypes to test and prioritize primary outcomes of interest. And, at least the hypothesis that these genes have shared impact on those particular phenotypes does become testable and falsifiable.
A second point of caution is that the novelty of these predictions depends on whether functional data (e.g., PPI), or functional annotations (GO) were not used to build the networks in the first place -a caveat that cannot be taken lightly. Otherwise the network is not really making new predictions, just redisplaying known relationships in a different form. Likewise, there are no generally enforced standards for how these networks are displayed, and in some cases the authors may have selected the presentation of the nodes that maximally illustrates the functional clustering they would like to discuss. Overall, care must be taken to assure that circular logic does not creep into the conclusions drawn from these analyses.
Nonetheless, a variety of methods exist which could test any novel predictions regarding shared impact on phenotype. Deeply phenotyped sets of patients (such as the Simons collection; Fischbach and Lord, 2010) could be studied to determine whether individuals with mutations in genes that are closely spaced in the network share more clinical features. Specific mutations could be isolated or introduced in IPSC derived neural cells and their consequences studied with data rich methods such as hi-content imaging or RNAseq, with the prediction that there will be more similar phenotypic consequences for genes that are closer in the network (Figure 3). However, cultured cells have limitations in terms of the cell types that can be generated. They also have a very limited behavioral repertoire. Thus, it is our opinion that there is also a strong need to study the commonalities in behavioral disruptions across a variety of mice modeling these mutations. Though mouse behaviors are not meant to be perfect proxies for human symptoms, behaviors are highly sensitive readouts of the functions of particular CNS circuits in an intact organism. Shared behavioral disruptions across these can indicate shared circuit level disruptions, and particularly cell-type predictions (Parikshak et al., 2013;Willsey et al., 2013;Xu et al., 2014;Chang et al., 2015) might be best tested in the context of a complex nervous system. Cre-Lox and optogenetic technologies FIGURE 3 | Hypothetical example of phenotypical clustering and epistasis analysis in a culture model. (A) A hypothetical network constructed with five ASD genes (black) has resulted in two modules of 5-6 genes that are connected by expression and PPI data that include both ASD genes and tightly connected genes (white) not yet implicated in ASD. (B) The network result leads to a hypothesis that genes that are in the same module regulate the same phenotype. This is tested using single gene knockdown in iPSC derived neurons followed by high content imaging of neuronal morphology and behavior. Knockdown of the members of the two modules results in distinct cellular phenotypes, consistent with them potentially representing two distinct mechanisms of developing ASD (potentially two subtypes requiring different treatments). In this hypothetical example the tightly connected genes show the same phenotype. (C) Epistasis analysis for neurite length is used to test the hypothesis that all genes in module 1 are in the same pathway regulating neurite growth and are distinct from genes in module 2. Single gene knockdowns of all genes are transiently transfected with constructs expressing each individual gene. Green squares are normal neurite length, red squares are shortened neurites. These hypothetical results suggest several conclusions: (1) negative control genes from module 2 can't rescue short neurites, again indicating they are not in this genetic pathway. (2) As a control, expression of each gene in module 1 can rescue (complement) its own phenotype. (3) The pattern of complementation can be used to infer the functional relationship between the genes (D). For example, gene 4 can be rescued by any other gene in the module, suggesting it must be before the others in the pathway, while gene 6 can rescue all others, but cannot be rescued by any of them, indicating it must be last. (D) The resulting pathway from the genetic analysis of neurite length. This result indicates that the original module did indeed represent a set of genes that regulate the same neurobiological phenomena. If indeed the shortened neurites lead to ASD, this also suggests that treatments targeting gene 6 (even though it was not itself an ASD gene), may be effective at treating individuals with ASD who have mutations in genes 3, 4, or 5. in particular provide the opportunity to explicitly test shared contributions of particular circuits downstream of a genetic lesion. There is also a clear need for a more systematic approach to behavioral phenotyping, as the current one-lab-evaluates-onemodel approach, often in different genetic backgrounds, makes careful and systematic post hoc comparisons across models nearly impossible.

Recommendation 6: epistasis analysis in models and mice
Genetic formalism has a lot to offer in the context of testing these networks. Genes that are closer in the networks (or perhaps coexpressed in the same cell type), may be in the same functional pathway. Studying compound mutations in humans might be informative (Pinto et al., 2014;Krumm et al., 2015). Indeed, examination of >100 million medical records has been used to test for epistatic effects of combinations of rare Mendelian diagnoses on risk of developing comorbid complex disease traits (Blair et al., 2013). But with currently available sample sizes for exome-sequenced cases, multiple ASD-related rare variants are unlikely to occur in the same individual frequently enough for formal testing of specific pairs of ASD-related genes. However, both in animals and in cell lines it is straightforward to make compound mutations and introduce rescue constructs with modern genome editing technologies. Thus, not only can we test whether network-associated genes have similar phenotypes (Figures 3A,B), we can also test whether any gene is in the same pathway and if it is dominant, complementary or suppressive of others (Figures 3C,D). Again, this could leverage both cell lines and mouse models for their relative strengths of either throughput or complexity.
Finally, all recommendations 4-6 might also provide the opportunity to subtype ASD cases into functionally distinct categories based on their molecular causes or cellular consequences -separate categories which may indeed be most amenable to different treatment strategies or that warrant stratification during clinical trials. Already results from exome studies are being used to define new subtypes of ASD starting from knowledge of the implicated gene (Bernier et al., 2014).
Understanding commonalties in different subtypes of patients might be key to identifying routes to treatments for each.

Conclusion: Building New Cornerstones from Old Capstones
Over the last few years, discovery-driven bioinformatics analyses of ASD-related genes have moved from final figure capstone analyses to stand alone manuscripts. In architecture, the capstone is the coping, the final layer of finer, flat stone on the top of a wall of a that is somewhat functional (e.g., to end the structure, to protect from weather) but also somewhat decorative. Meanwhile, the cornerstones, classically, are the first stones placed in a new building -the seeds from which new buildings arise. Thus, the time has come to push these systems biology analyses away from capstones and toward cornerstones: studies from which we can derive empirically testable theories regarding commonalities of mechanism(s) for the diverse genetic risk factors contributing to ASD. The overall challenge now is to define criteria with which to systematically evaluate these discovery-driven insights, and to generate falsifiable hypotheses from these ideas. The hypotheses that survive rigorous empirical testing have the potential to become the foundations of new edifices rising toward ASD treatments.