Target Genes of Autism Risk Loci in Brain Frontal Cortex

Autism spectrum disorder (ASD) is a complex neuropsychiatric disorder. A number of genetic risk loci have been identified for ASD from genome-wide association studies (GWAS); however, their target genes in relevant tissues and cell types remain to be investigated. The frontal cortex is a key region in the human brain for communication and cognitive function. To identify risk genes contributing to potential dysfunction in the frontal cortex of ASD patients, we took an in silico approach integrating multi-omics data. We first found genes with expression in frontal cortex tissue that correlates with ASD risk loci by leveraging expression quantitative trait loci (eQTLs) information. Among these genes, we then identified 76 genes showing significant differential expression in the frontal cortex between ASD cases and controls in microarray datasets and further replicated four genes with RNA-seq data. Among the ASD GWAS single nucleotide polymorphisms (SNPs) correlating with the 76 genes, 20 overlap with histone marks and 40 are associated with gene methylation level. Thus, through multi-omics data analyses, we identified genes that may work as target genes of ASD risk loci in the brain frontal cortex.


INTRODUCTION
Autism spectrum disorder (ASD) is a type of complex neurodevelopmental disorder mainly characterized by stereotyped behavior and deficiency in social communication ability. Among children, the prevalence rate of ASD has been estimated to be 1 in 68 in the USA and 1 in 100 worldwide, and there is four times higher prevalence among boys than girls (Ginsberg et al., 2012;Developmental Disabilities Monitoring Network Surveillance Year 2010 Principal Investigators;Centers for Disease Control and Prevention, 2014;Geschwind and State, 2015). ASD severely affects the life quality of patients and their families and increases public health burden (Ginsberg et al., 2013). ASD patients exhibit highly heterogeneous clinical presentations, and ASD patients are mainly treated by rehabilitation intervention with no specific therapeutic drug (Bowers et al., 2015). Therefore, it is necessary to understand the genetic mechanism underlying ASD development in important brain regions.
Genetic studies of ASD have revealed a number of risk loci that may contribute to ASD pathogenesis. It has been shown that single nucleotide polymorphisms (SNPs) located at loci 3p21 and 10q24, as well as in CACNA1C and CACNB2, are significantly associated with multiple psychiatric disorders including ASD (Cross-Disorder Group of the Psychiatric Genomics Consortium, 2013). Xia et al. discovered TRIM33 and NRAS-CSDE1 as ASD candidate genes by GWAS analysis of Chinese autistic patients and datasets of three European populations (Xia et al., 2014). A recent study by the Autism Spectrum Disorders Working Group of the Psychiatric Genomics Consortium (PGC) identified multiple loci, composed of common variants, associated with ASD and found a significant genetic correlation between ASD and schizophrenia via meta-analysis of more than 16,000 autistic patients (The Autism Spectrum Disorders Working Group of The Psychiatric Genomics Consortium, 2017). Furthermore, Cantor et al. found that rs289883 located in the intron of gene PHB was associated with the degree of behavioral abnormality in ASD patients (Cantor et al., 2018). Different brain regions control different functions, which may be impaired in ASD patients. The frontal lobe of the brain plays an important role in social, emotional, and cognitive functions and has shown severe dysfunction in ASD patients (Courchesne and Pierce, 2005). The frontal lobes in ASD patients undergo an abnormal overgrowth while other regions are not significantly enlarged (Buxhoeveden et al., 2006). Additionally, a decrease in astrocyte precursor cells and an increase in synaptic connectivity are observed in the frontal cortex of ASD patients (Broek et al., 2014). Previous studies demonstrated pronounced ASD-associated gene expression changes in the cerebral cortex, including attenuated distinction between the frontal and temporal cortices in ASD brains (Voineagu et al., 2011).
Because of the importance of the frontal cortex in normal brains and its dysfunction in ASD brains, we aim to identify targeted genes of ASD risk loci in the frontal cortex. We obtained ASD associated SNPs from the GWAS catalog and found genes with genotype-correlated expression in the frontal cortex tissue from eQTL databases. By analyzing microarray gene expression datasets, we then identified 76 ASD-loci correlated genes showing significant expression difference between ASD brain frontal cortices and controls, and we further replicated four genes in an RNA-seq dataset. Among the ASD GWAS SNPs correlating with the 76 genes, 20 overlap with histone marks and 40 were associated with the gene methylation level, suggesting that they may regulate the transcription of their target genes through epigenetic mechanisms. Our results help to understand how ASD GWAS loci confer disease risk and prioritize genes for further functional validation.

Extraction of ASD GWAS Loci
Significant ASD associations were downloaded from the GWAS catalog (https://www.ebi.ac.uk/gwas/) (Macarthur et al., 2017) using the keyword "autism spectrum disorder, " and SNP information was extracted from downloaded data. We did not apply any significance threshold when extracting ASD SNPs from the GWAS catalog.

Analysis of Microarray Data
Series matrix files of two microarray datasets GSE28475 (Chow et al., 2012) and GSE28521 (Voineagu et al., 2011) that compare transcriptome data in human brain frontal cortex between ASD cases and controls were downloaded from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/) (Barrett et al., 2013). There are 16 ASD cases and 16 controls in dataset GSE28521, and there are 52 cases and 61 controls in dataset GSE28475. Microarray data underwent quality control with log2 transformation and quantile normalization. Differential expression analysis was performed using the linear regression approach (Limma model) and further meta-analyzed using Fisher's method in the NetworkAnalyst portal (http://www.networkanalyst.ca/) (Xia et al., 2015) with adjustment for batch effects.

Analysis of RNA-Seq Data
The SRA file of an RNA-seq dataset (GSE102741) (Wright et al., 2017), which similarly compares transcriptome data in human brain frontal cortex between ASD cases and controls, was downloaded from the GEO database. Dataset GSE102741 contains 13 ASD cases and 39 controls. The quality of RNA-seq raw reads was examined using FastQC (Andrews, 2010), and reads were aligned to the human reference genome (GRCh37) using software HISAT2 (Kim et al., 2015). Then transcripts were assembled and quantified using Stringtie (Pertea et al., 2015) with the reference gene annotation (GRCh37) as a guide. Differential expression analysis between cases and controls was conducted using edgeR (Robinson et al., 2010).

Protein-Protein Interaction Network Analysis
The gene symbols were input into the NetworkAnalyst web portal, which maps each gene to protein-protein interaction (PPI) databases to construct networks. The PPI network was constructed among the 76 genes without further extension, with InnateDB (Breuer et al., 2013) as the source of protein interactions.

Analysis of Methylation Data
The generation of genome-wide methylation profiles of 843 subjects on the Infinium HumanMethylation450 BeadChip by the Center for Applied Genomics, the Children's Hospital of Philadelphia, was reported in a previous publication (Van Ingen et al., 2016). The methylation level of each methylation probe was represented by the M-values (the log2 ratio between the methylated and unmethylated probe intensities). The association of ASD GWAS SNPs with methylation probes in each of the 76 genes was assessed in a linear regression model including gender, age, and 10 genotype-derived principle components as covariates.

RESULTS
As the frontal cortex is involved in important brain functions, which are severely impaired among ASD patients (communication, language, social behavior, and complex cognitive functions), we are interested in identifying target genes of ASD risk loci in the frontal cortex region. To do this, we first extracted all reported ASD associated risk loci from the GWAS catalog. A total of 466 SNPs from 19 studies were extracted, with the highest reported association P-value of 1 × 10 −5 . The majority (97%) of these SNPs were located in non-coding regions. As these non-coding SNPs could function by regulating downstream target gene expression, we examined their potential regulatory effects in two eQTL databases [brain-frontal cortex tissue in the GTEx database (GTEx Consortium, 2013) and frontal cortex in the Braineac database (Ramasamy et al., 2014)].We found 457 genes from GTEx and 1,848 genes from Braineac with mRNA level correlated with the additive genotype of ASD GWAS SNPs (nominal P < 0.05). As GWAS loci and their targeted genes may not exhibit highly significant correlations in eQTL analysis, we took this less stringent threshold and combined the ASD-loci targeted genes from the two datasets, yielding a list of 2,098 genes. The eQTL associations suggest that the expression of these genes may be directly or indirectly influenced by the genotype of the ASD loci.
We hypothesized that the expression of genes functioning in the frontal cortex may be dis-regulated among ASD patients; thus, we conducted gene expression meta-analysis by comparing the mRNA level of ASD cases and that of healthy controls at the genome-wide scale using datasets GSE28475 and GSE28521 from the GEO database. The analysis yielded 893 differentially expressed genes (adjusted P < 0.05). Among the 2,098 genes likely regulated by ASD-loci, 76 displayed significant differential gene expression (Supplementary Table 1), implicating that these genes may be ASD loci-controlled genes in the frontal cortex.
As replication, we looked into ASD RNA-seq dataset GSE102741 in the GEO database. We similarly conducted transcriptome profiling analysis and found that four of the 76 genes identified in the above steps showed significant differences (P < 0.05) in mRNA level between ASD cases and healthy controls: HIST1H1C, HSPA1B, PRPF3, and SERPINA3 (Table 1). Therefore, differential expression of these four genes in the frontal cortex of ASD brains were further validated by RNAseq. Inability to validate the remaining 72 genes could be due to the small sample size of the RNA-seq dataset. We further checked brain Hi-C data and found additional supporting evidence for the plausible chromatin interactions between ASD SNPs and the target genes (Supplementary Figure 1). Certainly, future Hi-C experiments specifically focusing on the frontal cortex regions should be performed to examine these interactions. It has been reported in the human protein atlas database (https:// www.proteinatlas.org/) (Uhlen et al., 2015) that both the mRNA and proteins of ASD target genes HIST1H1C and PRPF3 were detected in human brain cerebral cortex; HIST1H1C protein level is particularly high. The HSPA1B protein and SERPINA3 mRNA were also detected in the cerebral cortex. By searching the Mouse Genome Informatics (MGI) database (http://www. informatics.jax.org) (Law and Shaw, 2018), we also found that mRNA of mouse HIST1H1C and PRPF3 homologues has been detected in mouse brains in previous publications (Mckee et al., 2005;Diez-Roux et al., 2011). Furthermore, abnormal behavior, neurological phenotype, or defects in nervous system have been documented for mouse strains carrying mutant Prpf3 or Hspa1b genes (Law and Shaw, 2018), suggesting the biological relevance of these genes to ASD.
To understand how the 76 genes are involved in ASD pathogenesis, we constructed a protein-protein interaction (PPI) network (Figure 1) using NetworkAnalyst. The largest module consists of 11 of the 76 genes, including three of the RNA-seq validated genes (HIST1H1C, HSPA1B, and SERPINA3). To fully understand the interactions between these genes, we further examined the pathways in which these genes are enriched. We found significantly enriched pathways: "Antigen processing and presentation" and "Noncanonical activation of NOTCH3" (Supplementary Table 2). Both of these pathways are highly relevant to ASD pathogenesis (Needleman and Mcallister, 2012;Hormozdiari et al., 2015;Bennabi et al., 2018;Jones et al., 2018).
To understand how ASD SNPs may regulate the expression of their target genes, we explored the functional annotations of ASD GWAS SNPs corresponding to the 76 target genes in the ENCODE (Encode Project Consortium, 2012) and ROADMAP (Roadmap Epigenomics Consortium et al., 2015) epigenome databases via the HaploReg web portal (Ward and Kellis, 2016). We found 20 ASD SNPs overlap with histone marks in the brain dorsolateral prefrontal cortex ( Table 3). This suggests that these SNPs may affect chromatin activation through histone methylation and acetylation, which in turn affects their target gene expression.
We also looked into whether there is any correlation between the genotypes of ASD SNPs and methylation at or near their target genes. Forty-five of the 76 genes contained probes with methylation level correlated with additive SNP genotype (Table 4) at a nominal significance level, suggesting that the expression level of these genes may be regulated by ASD SNPs through DNA methylation.

DISCUSSION
To find ASD GWAS loci targeted genes, we conducted an analysis integrating eQTL, transcriptome, epigenome, and methylation FIGURE 1 | Eleven out of 76 proteins encoded by the autism spectrum disorder (ASD) loci target genes showed protein-protein interactions (PPIs). The PPI network was constructed using NetworkAnalyst with InnateDB as the source of protein interaction data. The nodes represent proteins, and the size of the nodes reflects the number of interaction partners with it. The edges between the nodes indicate known interactions between the proteins. data. We began by analyzing the correlation between SNP genotype and mRNA level of genes reflected by eQTL data, and we obtained 2,098 target genes that may be regulated by these ASD loci. Then, we analyzed the differentially expressed genes between cases and controls at the mRNA level using array and RNA-seq data. A total of 76 genes with expression correlating with ASD SNP genotype were differentially expressed between ASD cases and controls in the frontal cortex in array data. Four of those genes were further validated by RNA-seq data. Evidence also suggested that the expression level of these genes could be regulated through histone modification or DNA methylation. Therefore, by in silico analysis, we identified candidate genes likely controlled by ASD loci in the frontal cortex, which are worthy of further experimental validation. There are multiple lines of evidence suggesting the involvement of the four candidate ASD genes in disease etiology. HIST1H1C encodes a protein that belongs to the histone cluster 1 H1 family. In an ASD model system based on haploinsufficiency of SHANK3, Darville et al. found that five histone isoforms including HIST1H1C were down-regulated upon lithium and VPA treatment in neurons differentiated from pluripotent stem cells. Lithium and VPA increased levels of SHANK3 mRNA, and the authors speculated that SHANK3 may be regulated through an epigenetic mechanism involving histone modification (Darville et al., 2016). In addition, HIST1H1C may also be involved in other brain disease development. For example, HIST1H1C displayed consistently significant increased mRNA level in the cortex of brains from 7-and 18-month-old mice in an Alzheimer's disease model (Ham et al., 2018). The mRNA level of HIST1H1C is up-regulated in hypoxia and is correlated with worse disease outcome among neuroblastoma patients (Applebaum et al., 2016). Mutation in other members of histone cluster 1 H1 family, such as HIST1H1E, has been detected in ASD patient and is likely to be the underlying causal mutation (Duffney et al., 2018). Systematic review indicated that nearly 20% of ASD candidate genes play a role in epigenetic regulations, especially histone modifications (Duffney et al., 2018). These data support the potential involvement of HIST1H1C in ASD development, likely through epigenetic regulation of neurodevelopmental genes.
HSPA1B encodes a heat shock protein, which works as chaperone for other proteins. In heat shock experiments on induced pluripotent stem cells modeling brain development under maternal fever, HSPA1B is one of the heat shock genes that drastically increased its mRNA level, together with other genes involved in neurogenesis and neuronal function (Lin et al., 2014). Heat shock proteins target mis-folded proteins and  facilitate proper refolding or targeting of damaged proteins for degradation (Lin et al., 2014). Its mRNA level is increased in the frontal cortex of schizophrenia subjects (Arion et al., 2007). It has been shown that HSPA1B also functions in the pathogenesis of other neurological conditions, such as Parkinson disease (Kalia et al., 2010), progressive supranuclear palsy (Hauser et al., 2005), and Alzheimer's disease (Sherman and Goldberg, 2001;Muchowski and Wacker, 2005), presumably by facilitating protein folding and inhibiting apoptosis (Leak, 2014). Genes enriched in multiple signaling pathways, like pathways of "Heterotrimeric G-protein signaling pathway" and "B cell activation," were altered by Hspa1b deficiency in an MPTP-induced mouse model of Parkinson disease (Ban et al., 2012). PRPF3 is one of several proteins interacting with U4 and U6 small nuclear ribonucleoproteins, which are components of spliceosomes. Mutations in MECP2 (methyl-CpG-binding protein 2) cause the neurodevelopmental disorder Rett syndrome. The MECP2 protein directly interacts with PRPF3 (Long et al., 2011), and several Rett-associated mutations in MECP2 affect interaction of MECP2 with PRPF3, implying that neurodevelopmental disorders, in general, could be related to abnormal mRNA splicing.
SERPINA3 belongs to the serine protease inhibitor family. The protein antagonizes the activity of neutrophil cathepsin G and mast cell chymase and has been implicated in neuroinflammation, neurodegeneration (Baker et al., 2007), and other types of brain conditions such as human prion diseases (Vanni et al., 2017). The mRNA level of SERPINA3 is robustly up-regulated in the prefrontal cortex of schizophrenia patients, suggesting its involvement in the pathogenesis of neuropsychiatric disorders (Arion et al., 2007;Saetre et al., 2007;Fillman et al., 2014).
In summary, we identified genes that may function as ASD genetic loci targeted genes in the brain frontal cortex through multi-omics data analyses. These genes are worth being further characterized for their function in ASD development through experimental approaches.

AUTHOR CONTRIBUTIONS
YS was mainly involved in the data analysis, processing, and summarization. XY was involved in part of data analysis and mainly responsible for drafting manuscript. QX and JL were responsible for conception and design of study and revising the manuscript critically for important intellectual content. Other authors have partially participated in the work to take public responsibility for the content, including participation in the concept, design, analysis, writing, or revision of the manuscript.

FUNDING
This study was supported by National Natural Science Foundation of China (81771769); Tianjin Natural Science Foundation (18JCYBJC42700); Startup Funding from Tianjin Medical University; and the Thousand Youth Talents Plan of Tianjin.