Genetics of the Hippocampal Transcriptome in Mouse: A Systematic Survey and Online Neurogenomics Resource

Differences in gene expression in the CNS influence behavior and disease susceptibility. To systematically explore the role of normal variation in expression on hippocampal structure and function, we generated an online microarray database for a diverse panel of strains of mice, including most common inbred strains and numerous recombinant inbred lines (www.genenetwork.org). Using this resource, coexpression networks for families of genes can be generated rapidly to test causal models related to function. The data set is optimized for quantitative trait locus (QTL) mapping and was used to identify over 5500 QTLs that modulate mRNA levels. We describe a wide variety of analyses and novel synthetic approaches that take advantage of this resource, and demonstrate how both the data and associated tools can be applied to the study of gene regulation in the hippocampus and relations to structure and function.


INTRODUCTION
Variation in hippocampal structure and function between different mouse strains is enormous (Wimer et al., 1978; van Abeelen and van den Heuvel, 1982;Crusio et al., 1987;Kempermann and Gage, 2002) and a genetic basis for this much of this variation is now well established. The contribution of single genes to complex behavioral phenotypes, however, is usually small, meaning such phenotypes need to be described in terms of networks of interacting genes (Flint and Mott, 2008). Insight into the genetic modulation of these networks can be achieved by measuring transcript expression in well characterized isogenic lines of mice that are reared in tightly controlled environments. Correlations between genetic variation and differences in traits such as locomotion, memory, or adult neurogenesis can be quantifi ed and functional differences mapped to genomic loci, known as quantitative trait loci (QTLs). Traditionally, physiological phenotypes have been used for such analyses -although transcript levels can also be used as quantitative traits to identify genetic loci infl uencing gene Genetics of the hippocampal transcriptome in mouse: a systematic survey and online neurogenomics resource of the hippocampal transcriptome is the fi rst to make use of this extended BXD panel. To complement this information, expression data were also obtained from the two BXD F1 hybrids, 15 CXB RI strains and 13 strains of the mouse diversity panel. These additional strains provide genetic diversity invaluable for the fi ne mapping of QTLs.
The GeneNetwork is a database of genotype and phenotype data for mapping studies as well as online tools for their analysis. At the time of writing, the GeneNetwork holds data from 14 mouse genetic panels including microarray expression data from a range of tissues as well as many diverse physiological phenotypes. The BXD panel is the best represented of these with 33 studies covering 19 anatomical regions, over half of which are neural tissue. The data presented here have been deposited online as part of this resource.
The present study applied expression genetics to the adult murine hippocampus and at the same time increased the resolution of the database to a level that fi rst conclusions about the architecture of hippocampal genetic networks would become possible. To achieve this end, an international consortium joined efforts to generate hippocampal transcriptome data and has made this information available online. This new database is described here together with a range of analyses that demonstrate how transcriptome data can be used to uncover the coordinated genetic regulation underlying the biology of the hippocampus.

ANIMALS, TISSUE PROCESSING AND ARRAY HYBRIDIZATION
This study used a total of 604 animals from 71 BXD strains (67 BXD lines plus the two parentals, C57BL/6J and DBA/2J, and the two F1 hybrids), 15 CXB strains (13 RI lines plus the two parentals, BALB/cByJ and C57BL/6ByJ), and a selection of 13 strains from the mouse diversity panel.
The hippocampal formation, excluding most of the subiculum, from two to three animals was dissected and pooled for hybridization to a single Affymetrix M430 2.0 array. A total of 201 arrays were used, and were processed at the W. Harry Feinstone Center for Genomic Research. Detailed information about the animals used in this study, including strain expression values, gender and processing information as well as detailed tables of individual array-level information can be found online at the GeneNetwork 1 .
All procedures involving mouse tissue were approved by the Institutional Animal Care and Use Committee at the University of Tennessee Health Science Center.
Raw microarray data were transformed using the PDNN, MAS5 and RMA methods. Transformed values were standardized to 2z + 8, thus yielding a data set with a standard deviation of 2 and an overall mean of 8. This ensures there are no negative values for further processing, and means that a one point difference is approximately equal to a twofold change in RNA levels. The PDNN transform consistently yielded the best results (as discussed in Results) and thus, unless otherwise specifi ed, all analyses presented use the PDNN-transformed data.

PROBE SET QUALITY CONTROL
The M430 2.0 probe set annotations in GeneNetwork have been manually and automatically curated for 4 years with a special focus on transcripts with high expression in the CNS. All probe sets mentioned in this manuscript were manually checked against the latest mouse genome assembly (mm9) using the UCSC Mouse Genome Browser 2 and the Entrez Gene repository 3 and verifi ed to have unique targets consistent with the currently available genomic data.

QTL MAPPING
For the analyses presented here, whole genome association mapping was carried out using the 69 BXD strains (excluding parentals and F1 hybrids). QTL mapping was done in GeneNetwork and has been described previously .

TRANSCRIPT CORRELATION AND NETWORK GRAPHS
All correlations presented are Pearson's product-moment. When considering threshold values for networks, absolute value correlations have been used; thus a strong correlation may indicate either a positive or a negative relationship between probe sets. The network graph (Figure 1) was generated using an implementation of the Kamada-Kawai layout algorithm (Kamada and Kawai, 1989) provided by the Java package 'KKLayout' from the Java Universal Network/Graph Framework 4 . Gene Ontology analysis was done using the web-based tool 'WebGestalt' (Zhang et al., 2005) 5 .

ONLINE DATA ACCESS AT THE GENE NETWORK
The gene expression data generated in this study, information on sample preparation and detailed methodology, the Published Phenotypes database, and a collection of online tools for data analysis are all publicly available at GeneNetwork 6 , an open, freely accessible web site that combines genetic and phenotypic databases with online tools to analyze the available data.

VARIATION IN HIPPOCAMPAL TRAITS IN THE BXD PANEL
The BXD panel contains a large number of polymorphisms accompanied by up to 94-fold differences in hippocampal transcript levels, with over 700 probe sets exhibiting a 10-fold or greater range of expression and over 4000 with greater than fourfold. This makes the BXD panel an attractive platform for investigating the phenotypic manifestations of these genes without the issues involved in the generation and analysis of knockout animals. While some highly polymorphic genes are inherited in a Mendelian manner, many are true polygenic complex traits with suffi cient variability to allow further analysis. Some examples of such transcripts with known relevance to hippocampal function are Marcks (Hussain et al., 2006;1415972_at; fourfold range in expression across the BXD panel), Dcx (Corbo et al., 2002 1418141_at;12-fold), Ncam1 (Cremer et al., 1997;1426865_a_at;fi vefold), Nos1 (Kirchner et al., 2004;1422949_at;fi vefold), Grin1 (Niewoehner et al., 2007;1450202_at;eightfold) and Grin2b (von Engelhardt et al., 2008; 1422223_at; fi vefold).

COVARIATION BETWEEN HIPPOCAMPAL TRAITS IN THE BXD PANEL
Transcripts with similar expression patterns are likely involved in common processes, and such related genes can be easily retrieved from the database to aid in the functional annotation of genes of interest. By computing the Pearson's product-moment correlation for any probe set against every other probe set in the database, a list of the most strongly correlated genes can be obtained. As an example, the probe set 1432108_at (Pcgf6), was investigated. This transcript has been identifi ed as a dentate gyrus marker by in situ hybridisation (Lein et al., 2004; Allen Brain Atlas, Image Series ID: 638729) and is present in our data set with a mean expression of 8.96 and a 6.27-fold range across the BXD panel. Expression of the probe set 1432108_at correlates at r > |0.75| with 110 other probe sets and the functional annotation of these genes using the DAVID tool 7 revealed an enrichment in zinc-fi nger and RING proteins. Such a correlation analysis is not limited to traits of the same type, so that repeating the above search against the phenotypes database identifi es a number of well-correlating entries, including Trait IDs 10378 (hippocampus granule cell number, r = −0.67), 10345 (probe trial water maze time spent in swim path r = −0.78), 10456 (total hippocampus volume r = −0.65), 10459 (granule cell layer volume r = −0.587), 10338 (proliferation of BrdU-labeled cells in subgranular zone r = −0.65) and 10604 (mean seizure severity r = 0.59). Interestingly, Pcgf6 is negatively correlated with granule cell number and dentate gyrus volume. Pcgf6, a member of the polycomb family of RING zinc fi nger proteins, has been identifi ed as a transcriptional repressor (Akasaka et al., 2002) which suggests a role as a negative controller of dentate gyrus granule cell number.

NETWORKS OF CORRELATED TRAITS
Expression correlation can be used to conceptually link genes into networks visualizing parts of the transcriptional interactome. The distance between transcripts is governed by the correlation, with higher correlations drawn closer together, so that groups of similarly expressed genes form visible clusters. 7 www.david.abcc.ncifcrf.gov FIGURE 1 | A network graph based on transcripts correlating with the hippocampal pyramidal-and granule cell layer volume traits (green nodes). Interactions have been fi ltered for an absolute correlation above 0.57 and are colored following the key shown. The graphing algorithm has attempted to draw stronger interactions as shorter lines, with the result that transcripts with similar expression patterns tend to cluster together. Two main clusters can be discerned; one, very dense, at the bottom of the image, is clearly associated with the granule cell layer volume trait and, to a lesser extent, the pyramidal cell layer volume trait. A second cluster, to the upper left, is hardly connected to pyramidal cell layer volume at all. Such a representation reveals the relationship between clusters of transcripts linked to the different phenotypes, thus potentially uncovering cell type-specifi c genetic pathways. Representation in tables alone would obscure such relationships.

www.frontiersin.org
Networks are not limited to expression data, indeed any trait may be used -as well as mixtures of different trait types. Particularly interesting is the use of phenotypic and gene expression traits in the same analysis. To illustrate this, the published phenotypes 'pyramidal cell layer volume' (HippPyrVol; BXD Published Phenotypes 10458) and 'granule cell layer volume' (HippGCVol; BXD Published Phenotypes 10459) (Peirce et al., 2003) were each correlated to the expression database and transcripts with an absolute correlation of 0.57 or above were used to build a network (Figure 1). It is interesting to note that Pcgf6, introduced above as a dentate gyrus expression marker, is also present in this network, negatively correlated to HippGCVol. Because the phenotypes HippGCVol and HippPyrVol are themselves correlated (r = 0.60), many transcripts are associated with both traits, as evidenced by the larger cluster in the lower part of the graph. A smaller cluster is positively correlated with only HippGCVol and may be involved in granule cell-specifi c functions -although the genes in this cluster have not yet been studied in this context.

GENETIC CONTROL OF HIPPOCAMPAL PHENOTYPES
The key advantage of a panel of recombinant inbred strains is that it is a genetically stable resource that can be used by a research community for years. Archived experiments, such as those in GeneNetwork's BXD phenotypes database, can be reanalyzed in the context of new data and the improved genotype maps Shifman et al., 2006). We used our new expression data to remap the HipV13a QTL on chromosome 13 that controls the volume of the dentate gyrus (Trait ID 10460 in the BXD Published Phenotypes database) (Peirce et al., 2003) and identifi ed a signifi cant QTL [P < 0.05; likelihood ratio statistic (LRS) = 19.3] on chromosome 13 (47-55 Mb) (Figure 2).

Frontiers in Neuroscience | Neurogenomics
SNPs within their coding regions (Ror2: 637, and Sptlc1: 239), Ror2 was only poorly expressed and Sptlc1 showed a rather weak correlation (r = 0.35) with dentate gyrus volume. The Tpmt transcript, in contrast, had a strong QTL (LRS = 33.0) at this same locus.

EXPRESSION GENETICS IN THE HIPPOCAMPUS
Besides its implications for refi ning mapping intervals in classical QTL studies, expression genetics reveals fundamental insights into the genetic structure of a given tissue. By whole-genome QTL mapping, we have identifi ed a large number of transcripts whose expression is modulated by polymorphisms between the two parental strains.
Cis regulatory genes or loci are operationally defi ned as those whose peak association lies within a 10 Mb interval surrounding its own physical location in the genome. Functionally, cis-acting genes are considered to be largely auto-regulatory in that they control their own expression. Trans-acting QTLs, in contrast, are controlled by genes at a different physical location. To get an idea of the increase in power obtained from this data set, we counted the number of QTLs from data processed using three alternative normalization methods (PDNN, RMA, and MAS5). Especially the number of strong cis-QTLs is considered a good indicator of the quality and power of the data (Carlborg et al., 2005). The results are summarized in Table 1.
For genes whose expression is strongly modulated by polymorphisms which are also associated with changes in hippocampal function, we can use transcriptional QTLs as a starting point for more detailed analyses of the genetic bases of hippocampal function in health and disease (Figure 3). As an example, we searched the PDNN data set for genes associated with hippocampus-related entries in the Entrez GeneRIF fi eld. A search for the keyword 'Alzheimer' returned 101 probe sets with a signifi cant QTL; among the genes targeted by these probe sets are Apod, Ncam1, Bcl2 and Bcl2l2. The query 'neurodegenerative' yielded 54 probe sets, including Cdk5, Nos1, Park7 and Polg. Among the 60 QTLs associated with the keyword 'cognitive' were Comt, Drd1a, Prnp, Mapt and Ntrk2. Such results will serve as starting points for what we call 'reverse complex trait analysis' , in which a gene associated with a strong cis-acting effect can be worked backwards to investigate downstream consequences of known variation in gene expression.

PLEIOTROPIC EFFECTS OF TRANS-ACTING REGULATORY LOCI
Trans-QTLs associated with expression of diverse genes can often be localized to common loci. Genes at these loci appear to control the expression of large numbers of downstream genes, suggesting that they act as 'master modulatory loci' . Comparing transcriptome maps of whole brain and hippocampus, some trans-QTL bands are common between the two tissues, whereas others appear to be tissue-specifi c (data not shown). In the hippocampus, major bands were identifi ed on chromosomes 1, 2, 5, 12, 15 and 19 (Figure 4). A particularly strong 'trans-band' in the hippocampus, which is not as prominent in whole brain, lies on distal chromosome 5. This was named Trans5a and can be localized to three markers: rs13478539, rs3708411, and rs8265855. The inclusive interval extending to the two next fl anking markers is around 6 Mb wide (from 132.834686 to 138.965374), including 121 known genes. Interestingly this interval spans the region homologous to the region deleted in humans with Williams-Beuren syndrome (OMIM 194050). Characterized by the 'elfi n' features thought to be caused by the (diagnostic) haploinsuffi ciency of the elastin gene, Williams syndrome is also associated with cardiac malformation, social disinhibition, hyperacuity and usually some degree of cognitive impairment. Functional and metabolic abnormalities in the hippocampal formation affecting cognitive ability have been reported (Meyer-Lindenberg et al., 2005), which might help explain the defi cits in memory and spatial navigation in this disease.

Frontiers in Neuroscience | Neurogenomics
To suggest genes that might be candidates for the modulatory locus, we searched for genes with a cis-QTL in the Trans5a interval. The probe sets 1448760_at (Zfp68), 1420095_at (Zipro1), 1425531_at (Znhit1), 1429152_at (Zkscan1) and 1415901_at (Plod3) had above average expression levels and signifi cant cis-QTLs within this interval. As expression of most genes controlled by the Trans5a locus should correlate well with the expression of the responsible gene at the locus itself, we surveyed each of the transcripts exhibiting the trans-QTL for strongly correlating transcripts whose genes are among those in the Trans5a interval. The best candidate using this approach was Zkscan1 (Probeset ID 1429152_at), a zinc fi nger protein of the SCAN domain family.

EPISTATIC INTERACTIONS BETWEEN QTLs
Most genes do not act in isolation and therefore will not have a Mendelian effect on expression. Such genes will usually not exhibit a single strong QTL but will rather be associated with several smaller-effect loci relating to genes whose products work together to modulate expression of the target gene. Historically, the statistical power required for the identifi cation of these effects has not been available. The size of the current data set, however, is suffi cient for the discovery of strong epistatic interactions. As an example, the probe set 1435411_at (Neurod2) was used as a query with the Pair Scan function in GeneNetwork's mapping module. A two-locus interaction plot identifi es a suggestive interaction between loci at Chr3@67.9 Mb (Neurod2Epi3) and Chr4@54.1 Mb (Neurod2Epi4) (Figure 5). The conventional mapping analyses for these two loci are not above background (LRS of 0.939 and 0.002 respectively) whereas the peak LRS of the interaction is 30.973 (LRS of the full model is 31.915). This suggests genetic factors at the two loci Neurod2Epi3 and Neurod2Epi4 which together infl uence the expression of Neurod2. Using the literature correlation function in GeneNetwork, we identifi ed a strong correlation (r = 0.76) between Neurod2 and Lxn (latexin) on Chr3@67.55 Mb at the Neurod2Epi3 locus. These fi ndings suggest Lxn as a candidate component of the Neurod2 pathway, important in granule cell differentiation (Schwab et al., 2000), and recent evidence suggests an anti-proliferative role of Lxn in hemopoietic cells (Liang et al.,  www.frontiersin.org 2007). A possible candidate for Neurod2Epi4 is Rod1 (regulator of differentiation 1; Probeset ID 1455819_at), on Chr4@59.57 Mb, which has a correlation with 1435411_at (Neurod2) of r = 0.39. Rod1 is a homolog of a yeast gene involved in regulating the onset of differentiation (Yamamoto et al., 1999).

DISCUSSION
We have presented here a database of hippocampal expression information together with a range of example analyses showcasing a number of ways in which this resource can be mined. QTL analyses have long been valued as a way of identifying the molecular correlates of complex traits, and the data described above offer an unprecedented source of transcript expression QTLs for the detailed molecular study of the mouse hippocampus model. The large scale and exceptionally high quality of the data have, in addition, made possible more daring investigations of complex QTL interactions.
The BXD RI panel is the largest available in a mammalian species, and due to the logistics of assembling such a resource, we are confi dent that it will remain so until similar studies become available using the 8-way Collaborative Cross (Churchill et al., 2004), a community project that is under way but will require several more years to reach completion. The 69 BXD strains studied here also represent one of the largest expression databases of its kind, and the addition of comparable data from the CXB and Mouse Diversity panels has resulted in a resource that is signifi cantly larger than anything else currently available.
The key advantage afforded by the large size of the panel is the additional statistical power this gives to the linkage associations and thus to the strength of the resulting QTLs. This can be seen most clearly in Table 1 where the number of QTLs above the statistical signifi cance threshold is a dramatic indication of the improvement of the hippocampal database over those previously available for the mouse model. Many of the QTLs identifi ed are particularly strong and resolve to clear peaks that can be localized with high precision (Figure 3). A side effect of this QTL quality is that one can now identify large numbers of less strong, secondary QTLs which were previously lost to background noise, and this information opens up a whole new range of possible analyses, such as the identifi cation of epistatic interactions (Figure 5), that promise to uncover pathways of genetic control within the tissue studied.
Traditionally, QTL mapping starts with a phenotype of interest, measured in a genetic reference population, and aims to map this trait to a genomic sequence variant. The advent of larger panels and denser marker maps, in conjunction with high quality gene expression data, now means that expression QTLs are statistically robust enough to be considered starting points for further study in their own right. This can be used to great effect in reverse complex trait analysis, a powerful new approach in which segregating genetic variation, as evidenced by a strong QTL, is mapped to other potentially interacting genes, and ultimately back to candidate phenotypes. With a known QTL and a body of evidence suggesting possible roles for the affected gene, phenotypes can be predicted that may be modulated as a result of this sequence variation. If this phenotype is of interest, it can be directly measured and a traditional 'forward' QTL analysis carried out to confi rm the prediction. Such an approach is extremely attractive when the enormous cost and time required for phenotyping a large panel is considered. The 'reverse' component of the study is entirely computer based, and no further laboratory work is needed beyond that already invested in the database resource described here.

FIGURE 5 | A two-way QTL plot reveals an epistatic interaction between loci on chromosomes 3 and 4 affecting the expression of Neurod2
(1435411_at). Individually, these loci do not signifi cantly correlate with the trait analyzed, but together they generate a strong QTL (LRS Full = 31.92).

FIGURE 6 | Model of gene-trait interactions in complex trait analysis.
The fi gure shows two sources of variation, phenotypic -meaning the quantitative differences in trait expression measured across the panel of strains (labeled TRAIT) -and the differences in genotype, as measured at a discrete number of marker loci in the different strains (GENE). Interactions between these sources of variation are the basis of complex trait analysis. QTL, quantitative trait locus; reverse CTA, reverse complex trait analysis.

Frontiers in Neuroscience | Neurogenomics
These different approaches have been summarized in a visual model which shows the various relationships between genotype and phenotype (Figure 6). Variation exists both at the level of genotype, as sequence polymorphism between different strains, and at the level of phenotype, in that animals from different genetic backgrounds exhibit large ranges of expression in phenotypic traits. Covariation between any two of these sources of variation falls into one of the four categories shown: gene-gene covariation is genetic linkage, most often as a result of genomic proximity; trait-trait covariation refers to the correlation between phenotypes and/or transcript expression, as in Figure 1; mapping variation in trait expression to genotype is the essence of QTL analysis, as shown in Figures 2 and 3; and reversing this process to identify traits whose expression co-varies with a known sequence variant or genomic locus is what we have termed reverse complex trait analysis.
The link between expression correlation and functional association is indirect and complex, involving many variables at the level of transcriptional and translational control, post-translational modifi cation and protein interaction. The availability of genome-scale interaction data, however, presents an attractive entry-point for more detailed studies of candidate genes. Although the constraints of space have prevented a detailed treatment of individual candidates in this manuscript, ongoing work in our own laboratories is using these data to identify potential interaction partners for already known regulatory genes and thus to suggest pathways in which these might be working.
The expression data generated by this study have been depos ited online as a part of the GeneNetwork, a repository of genotypes, and physiological and expression phenotypes which is openly accessible 8 .
In addition to the transcriptional data provided by studies such as the one presented here, a range of related information is becoming available to support in silico identifi cation of candidate pathways. Together with evidence from other sources, a compelling case can often be made for more focused experimental study. Sources of gene-gene interaction meta data are now offered by projects such as the Semantic Gene Organizer © (Homayouni et al., 2005), built into GeneNetwork as the Literature Correlation function which uses latent semantic indexing of PubMed abstracts to assign a correlation metric to pairs of genes; and an initiative from the Allen Institute for Brain Science in which genes are correlated based on the similarity of their spatial in situ expression patterns (Lau et al., 2008).
The aim of our new resource is to uncover genetic pathways underlying complex hippocampal phenotypes, and the utility of the current database will only grow as additional phenotypes are measured in the BXD mice and deposited online.