DATA REPORT article

Front. Neuroanat., 26 October 2016

Volume 10 - 2016 | https://doi.org/10.3389/fnana.2016.00086

HDBR Expression: A Unique Resource for Global and Individual Gene Expression Studies during Early Human Brain Development

  • 1. Institute of Genetic Medicine, Newcastle University Newcastle upon Tyne, UK

  • 2. Institute of Neuroscience, Newcastle University Newcastle upon Tyne, UK

  • 3. Institute of Child Health, University College London London, UK

This paper describes a new resource, HDBR (Human Developmental Biology Resource) Expression, for studying prenatal human brain development. It is unique in the age range (4 post conception weeks [PCW] to 17PCW) and number of brains (172) studied, particularly those under 8PCW (33). The great majority of the samples are karyotyped. HDBR Expression is also unique in that both the large-scale data sets (RNA-seq data, SNP genotype data) and the corresponding RNA and DNA samples are available, the latter via the MRC-Wellcome Trust funded HDBR1(Gerrelli et al., ). There are 557 RNA-seq datasets from different brain regions, the majority between 4 and 12PCW. During this time the major brain regions are established and the early stages of cortex development occur (Bystron et al., ; O'Rahilly and Muller, ). In addition, there are 42 RNAseq data sets from spinal cord and 29 from cerebral choroid plexus. There are also 243 additional tissue specimens in paraffin wax blocks available for individual gene expression studies. For almost all of the brains and specimens in wax blocks there are corresponding SNP genotype data.

Large-scale/high-throughput studies, such as next-generation sequencing, are providing raw material in a wide variety of research fields (for review of concepts and methodologies of RNA-seq, see Shin et al., ). Studies of human development are hampered by difficulties in obtaining tissue which means that publicly available large-scale data sets are particularly useful because data can be used and re-used (Kang et al., ; Zhang et al., ; Fietz et al., ; Miller et al., ; Darmanis et al., ).

Materials and methods

Human tissues

Human embryonic and fetal tissues were obtained from the MRC/Wellcome-Trust funded Human Developmental Biology Resource. HDBR is a tissue bank regulated by the UK Human Tissue Authority (HTA2) and operating in line with the relevant HTA Codes of Practice. Tissue samples are collected with appropriate maternal written consent and approval from the NRES Committee North East - Newcastle and North Tyneside 1 (REC reference 08/H0906/21+5) or NRES Committee London-Fulham (REC reference 08/H0712/34+5).

Tissues were collected over a period of 11 years (February 2003–January 2014) with the majority (82%) collected between January 2010 and January 2014 (01/28/2014 last collection date). Tissues were either fixed at 4°C in 4% paraformaldehyde and embedded in paraffin wax following standard protocols (Bussolati et al., ) or dissected (as described below) and tissues frozen at −80°C for RNA and DNA preparation. For the embryos and fetuses that were fixed and embedded, a small sample of the embryonic-derived part of the placenta or skin tissue was taken for DNA preparation.

There were three sets of tissues: (a) samples of embryonic-derived placenta or skin from embryos or fetuses that had been fixed and paraffin wax embedded. These tissues were used solely for DNA preparation and SNP genotyping. (b) Brain tissues where each sample was subdivided and part used for RNA preparation and part used for DNA preparation, followed by RNA-sequencing and SNP genotyping respectively. (c) Brain tissues where each sample was subdivided and part used for RNA preparation and part used for DNA preparation, followed only by RNA-sequencing. This meant that, where tissues from several different brain regions (and/or spinal cord, and/or choroid plexus) were collected from individual human embryos or fetuses, SNP genotyping was only carried out once. However, DNA was prepared separately from each of the regions sampled for a particular individual embryo or fetus, and these are available for future studies, e.g., epigenetic analysis (Reilly et al., ).

Brain tissue dissections

The dissection protocol depended on the developmental stage of the embryo or fetus, reflecting the size of the brain, and the state of disruption of the tissue. The aim was to dissect brains into forebrain, midbrain and hindbrain and at later stages, to further dissect the forebrain into: (1) telencephalon (left and right in some cases) and diencephalon or (2) cortex (left and right in some cases; temporal lobe removed and it and the remaining cortex divided into strips depending on size), basal ganglia and diencephalon; hindbrain (cerebellum and medulla). The midbrain was collected as a single sample except in a few cases where it was dissected into left and right parts. Figures 1A,B show brains at two developmental stages (7PCW and 10PCW, respectively) highlighting the areas that were dissected. Where the cortex was divided into strips, this was done evenly across the cortex. In most cases five strips were generated but in some cases this varied because of the size of the brain (Ip et al., ). In all cases the most anterior strip was labeled 1 and the strips numbered sequentially from there toward the most posterior strip (usually labeled 5).

Figure 1

). Red—telencephalon, green—diencephalon, blue—midbrain, purple—hindbrain, deep red—spinal cord, gray—rest of head, and body. (B) 10 PCW. A 3D model of the brain and part of the spinal cord was generated by magnetic resonance imaging and brain regions defined. The front of the brain is on the left. In the image, the left cerebral cortex has been removed to show the inner view of the right cerebral cortex plus the inner structures of the telencephalon (choroid plexus and basal ganglia) as well as structures that lie between the two cerebral cortexes (diencephalon and midbrain) which are fully or partially hidden when looking at a whole brain of this age. Red—cerebral cortex, gray—cerebral choroid plexus, orange—basal ganglia, green—diencephalon, blue—midbrain, purple—cerebellum and pons, pink—medulla oblongata, deep red—upper part of spinal cord. For both images, the 3D models were visualized and brain regions were defined using MAPaint, custom-designed software from the Edinburgh Mouse Atlas Project team3. (C) Shows principal component analysis (PCA) analysis carried out on all RNAseq datasets. Choroid plexus samples (khaki green) provide the most distinct set. Forebrain (green) and hindbrain (blue) samples separated out but with some slight overlap. Midbrain samples (purple) and unidentified brain samples (red) fell within the forebrain and hindbrain clusters. (D) Shows Venn diagrams comparing genes that are differentially expressed in a subset of RNAseq data sets from cortical samples at 9 and 12 PCW. The upper panel compares the top 200 genes expressed differentially between 9 and 12 PCW (anterior, central, posterior, and temporal cortex samples grouped for each age) where the expression differences had the lowest p-values with the top 200 differentially expressed genes with the largest fold changes. There were 81 genes that were identified as differentially expressed between 9 and 12 PCW where the expression differences had both the lowest p-values and showed the highest fold change. All 200 genes with the largest fold change had a p-value < 0.05. The lower panel compares genes differentially expressed between anterior and posterior cortex at the two stages. At 9 PCW, 146 genes were differentially expressed between the anterior and posterior cortex. At 12 PCW, 185 genes were differentially expressed between the anterior and posterior cortex. 17 of these genes were differentially expressed between the anterior and posterior cortex at both 9 and 12 PCW. The lists of differentially expressed genes corresponding to those summarized in the upper and lower panels are shown in Supplementary Tables 3, 4, respectively, on the HDBR website.

We also collected 29 cerebral choroid plexus samples and 42 spinal cord samples. There are also 99 samples where the region of brain could not be determined and these are simply labeled “brain fragments.”

The tissues were sent to AROS Applied Biotechnology4 who prepared DNA and RNA and carried out SNP genotyping and RNA-sequencing as described below.

DNA and RNA preparation

DNA was extracted from 435 human embryonic and fetal tissue samples on the QIAsymphony SP using manufacturer's5 protocol DNA HC. DNA was quantified on the QuBit system (specific for dsDNA).

RNA was extracted from 705 human embryonic and fetal tissue samples. After lysis using the TissueLyser and removal of fat from the sample with chloroform, RNA (including small RNAs) was purified on the QIAsymphony SP using protocol miRNA v05. The RNA yield was estimated using Nanodrop A260 measurement and the quality evaluated for approximately 15% of the samples using an Agilent Bioanalyzer. Seventy samples had either too little RNA or the RNA was of insufficient quality. A further 3 samples failed the quality control tests at the library preparation stage (see below), 4 samples were excluded because they did not match their corresponding DNA genotyping data meaning that RNAseq datasets were obtained from 628 tissue samples in total.

SNP genotyping

SNP genotyping was carried out according to the Illumina Infinium LCG Quad Assay protocol6. Briefly, DNA was denatured, amplified and then hybridized to Illumina's HumanOmni5-Quad BeadChip (HumanOmni5-4v1_B). Array-based single base primer extension was performed using labeled nucleotides (C and G nucleotide were biotin-labeled while A and T were dinitrophenyl-labeled). Then, after washing and drying, the BeadChips were imaged using the Illumina iScan system. After scanning the idat files were imported into the Illumina GenomeStudio software for genotyping calls and gender calls (average call rate 98.3%).

RNA-sequencing and analysis

cDNA was generated from the RNAs using Illumina's Stranded mRNA Sample Prep Kit followed by library preparation following Illumina's guidelines for the TruSeq Stranded mRNA LT sample prep kit. Four hundred ng of total RNA was used as the input for each sample. The concentration of each library was determined using the KAPA qPCR kit (KK4835) and triplicate reactions using three independent 106-fold dilutions of the libraries. The size profile of approximately 15% of the libraries was evaluated using an Agilent Bioanalyzer DNA 1000 chip. The average final library size was between 272 and 467 bp (includes 120 nucleotides of adapter sequence). The libraries were sequenced on an Illumina HiSeq2000.

RNA-seq data were processed and analyzed to identify differentially expressed genes. The quality of sequencing reads was first checked with FastQC7. Poly-N tails were trimmed off from reads with an in house Perl script. The 12 bp on the left ends and 4 bp on the right ends of all reads were clipped off with Seqtk8 to remove biased sequencing bases observed in FastQC reports. Low quality bases (Q < 30) and standard Illumina (Illumina, Inc. California, U.S.) paired-end sequencing adaptors on 3′ ends of reads were trimmed off using autoadapt9 and only those that were at least 20 bp in length after trimming were kept. The high quality reads were then mapped to the human reference genome hg38 with Tophat2 (Kim et al., ). Reads aligned to genes were counted with HTSeq-count (Anders et al., ). Differentially expressed genes were then identified with Bioconductor (Gentleman et al., ) package DESeq2 (Love et al., ).

HDBR expression resource

Table 1 summarizes the developmental stages and tissue regions for which there is RNA-seq data. For each individual tissue sample there is information on the name of the sample (e.g., HDBR251), the ID number of the embryo or fetus which it came from (e.g., 1406), the developmental stage and karyotype, all of which can be found in the sample attributes and variables accompanying the data sets uploaded to ArrayExpress10 (Kolesnikov et al., ). For most of the samples there is also information on the time to collection and this, with the information on each sample, can also be found in Supplementary Table 1 on the HDBR website11. The ID number of the embryo or fetus enables all the RNA-seq data sets from a single embryo or fetus to be identified. Similarly this ID number links to the corresponding SNP genotyping data. There are also ID numbers for embryos and fetuses for which there is SNP genotyping data and a corresponding wax block available for individual gene expression analyses. Each data entry in the SNP genotyping data repository also has the information for the corresponding tissue sample from which DNA was prepared (placenta or skin) as well as what tissues are in the wax block. The information on all the SNP genotype data can also be found in Supplementary Table 2 on the HDBR website.

Table 1

Stage (post conception week)45678910111213141516171920Any stage
Brain2862617181276277311912221021557
   Forebrain1211745717465214810166305
       Telencephalon484611384510710156236
Cortex162841825648103122
Temporal lobe11128101223242
Basal ganglia15515231133
Whole telencephalon16211121
Telencephalon fragments46818
       Diencephalon19645331142
       Whole forebrain12113322318
       Forebrain fragments421119
   Midbrain121122834103156
   Hindbrain21342155811423197
Cerebellum17524421136
Pons112
Medulla oblongata1442202125
Whole hindbrain21365123229
Hindbrain fragments545
   Brain fragments23213312441092232199
Spinal cord128164433142
Choroid plexus11623411129
All2983419891336884332012231021628

Developmental stage and tissue distribution of RNAseq datasets.

Each cell represents the number of RNAseq datasets for a particular tissue at each developmental stage. The totals given in the rows with the tissue labeled in red are the summation of the datasets for the tissues in indented rows below e.g., “Brain” is the sum of the Forebrain, Midbrain, Hindbrain, and “Brain fragment” cells in the column for each stage. “Fragments” is used for samples where the precise region is unknown. For ease of viewing the table the subdivisions of Telencephalon and Hindbrain (3rd level of indent) are given in italics.

The RNAseq data files are all fastq format and the raw data files for the SNP genotype have been uploaded. Both the RNAseq and SNP genotyping files are identified by the sample name which links to the sample information in the “sample attributes and variables” tab in ArrayExpress and in the Supplementary Tables on the HDBR website. The experiment number for the RNAseq data set is E-MTAB-4840 and for the SNP dataset is E-MTAB-4843. The RNAseq data set will be incorporated into the European Bioinformatics Institute (EBI) Expression Atlas12 which is EBI's value-added database for high-quality data from large microarray and RNA-sequencing experiments. In the latest version, Expression Atlas analyses selected large RNA sequencing experiments to produce “baseline expression,” the abundance of each gene and splice site variant from the individual biological components (e.g., tissues or cells) used in the experiment (Petryszak et al., ). The HDBR RNAseq dataset will provide baseline expression from different brain regions across a substantial time period of early human development (4 to 17 PCW).

Overview of RNA-seq datasets and preliminary characterization of datasets from a subset of cortical samples

Principal component analysis (PCA) analysis was carried out based on the normalized gene expressions from the RNA-seq datasets with the samples categorized according to gross region (forebrain, midbrain, hindbrain, and spinal cord). The datasets from brain fragments and choroid plexus were also included. From Figure 1C it can be seen that there is clustering according to brain region and choroid plexus samples appear as a separate tight group.

A subset of 64 RNA-seq datasets from anterior, central, posterior, and temporal cortex taken at either 9 or 12 PCW were selected for further differential expression analysis. Figure 1D shows that there is a larger number of genes differentially expressed with age rather than cortical spatial location at these stages. It is also clear, however, that there is differential expression between anterior and posterior cortex at both stages and the evidence suggests that the expression profiles of both the anterior and posterior cortex change from 9 to 12 PCW.

Funding

We gratefully acknowledge funding for this work from the UK Medical Research Council (grant number MC_PC_13047). PC is a Wellcome Trust Senior Fellow in Clinical Science (101876/Z/13/Z), and a UK NIHR Senior Investigator, who receives support from the Medical Research Council Mitochondrial Biology Unit (MC_UP_1501/2).

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Statements

Author contributions

All authors contributed to revising the work, had final approval of the version to be published and agree to be accountable in relation to the accuracy and integrity of the work. SJL drafted the paper and PC, SJL, MS, and AC made substantial contributions to the conception and design of the work; YX, LH, GC, and MK made substantial contributions to the analysis and interpretation of data and SNL, DG, AT, and JC made substantial contributions to the acquisition of data.

Acknowledgments

The human embryonic and fetal material was provided by the Joint MRC/Wellcome Trust (grant # 099175/Z/12/Z) Human Developmental Biology Resource (www.hdbr.org). We thank the HDBR staff for their careful and skilled work in collecting and dissecting the tissues.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary material

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fnana.2016.00086/full#supplementary-material

References

  • 1

    AndersS.PylP. T.HuberW. (2015). HTSeq–a Python framework to work with high-throughput sequencing data. Bioinformatics31, 166169. 10.1093/bioinformatics/btu638

  • 2

    BussolatiG.AnnaratoneL.MedicoE.D'ArmentoG.SapinoA. (2011). Formalin fixation at low temperature better preserves nucleic acid integrity. PLoS ONE6:e21043. 10.1371/journal.pone.0021043

  • 3

    BystronI.BlakemoreC.RakicP. (2008). Development of the human cerebral cortex: boulder committee revisited. Nat. Rev. Neurosci.9, 110122. 10.1038/nrn2252

  • 4

    DarmanisS.SloanS. A.ZhangY.EngeM.CanedaC.ShuerL. M.et al. (2015). A survey of human brain transcriptome diversity at the single cell level. Proc. Natl. Acad. Sci. U.S.A.112, 72857290. 10.1073/pnas.1507125112

  • 5

    FietzS. A.LachmannR.BrandlH.KircherM.SamusikN.SchröderR.et al. (2012). Transcriptomes of germinal zones of human and mouse fetal neocortex suggest a role of extracellular matrix in progenitor self-renewal. Proc. Natl. Acad. Sci. U.S.A.109, 1183611841. 10.1073/pnas.1209647109

  • 6

    GentlemanR. C.CareyV. J.BatesD. M.BolstadB.DettlingM.DudoitS.et al. (2004). Bioconductor: open software development for computational biology and bioinformatics. Genome Biol.5:R80. 10.1186/gb-2004-5-10-r80

  • 7

    GerrelliD.LisgoS.CoppA. J.LindsayS. (2015). Enabling research with human embryonic and fetal tissue resources. Development142, 30733076. 10.1242/dev.122820

  • 8

    IpB. K.WapplerI.PetersH.LindsayS.ClowryG. J.BayattiN. (2010). Investigating gradients of gene expression involved in early human cortical development. J. Anat.217, 300311. 10.1111/j.1469-7580.2010.01259.x

  • 9

    KangH. J.KawasawaY. I.ChengF.ZhuY.XuX.LiM.et al. (2011). Spatio-temporal transcriptome of the human brain. Nature478, 483489. 10.1038/nature10523

  • 10

    KimD.PerteaG.TrapnellC.PimentelH.KelleyR.SalzbergS. L. (2013). TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol.14:R36. 10.1186/gb-2013-14-4-r36

  • 11

    KolesnikovN.HastingsE.KeaysM.MelnichukO.TangY. A.WilliamsE.et al. (2015). ArrayExpress update–simplifying data submissions. Nucleic Acids Res.43, D1113D1116. 10.1093/nar/gku1057

  • 12

    LoveM. I.HuberW.AndersS. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol.15, 550. 10.1186/s13059-014-0550-8

  • 13

    MillerJ. A.DingS. L.SunkinS. M.SmithK. A.NgL.SzaferA.et al. (2014). Transcriptional landscape of the prenatal human brain. Nature508, 199206. 10.1038/nature13185

  • 14

    O'RahillyR.MullerF. (2008). Significant features in the early prenatal development of the human brain. Ann. Anat.190, 105118. 10.1016/j.aanat.2008.01.001

  • 15

    PetryszakR.BurdettT.FiorelliB.FonsecaN. A.Gonzalez-PortaM.HastingsE.et al. (2014). Expression Atlas update–a database of gene and transcript expression from microarray- and sequencing-based functional genomics experiments. Nucleic Acids Res.42, D926D932. 10.1093/nar/gkt1270

  • 16

    ReillyS. K.YinJ.AyoubA. E.EmeraD.LengJ.CotneyJ.et al. (2015). Evolutionary genomics. Evolutionary changes in promoter and enhancer activity during human corticogenesis. Science347, 11551159. 10.1126/science.1260943

  • 17

    SharpeJ.AhlgrenU.PerryP.HillB.RossA.Hecksher-SorensenJ.et al. (2002). Optical projection tomography as a tool for 3D microscopy and gene expression studies. Science296, 541545. 10.1126/science.1068206

  • 18

    ShinJ.MingG. L.SongH. (2014). Decoding neural transcriptomes and epigenomes via high-throughput sequencing. Nat. Neurosci.17, 14631475. 10.1038/nn.3814

  • 19

    ZhangY. E.LandbackP.VibranovskiM. D.LongM. (2011). Accelerated recruitment of new brain development genes into the human genome. PLoS Biol.9:e1001179. 10.1371/journal.pbio.1001179

Summary

Keywords

human, embryo, fetal, RNAseq, SNP genotyping, HDBR

Citation

Lindsay SJ, Xu Y, Lisgo SN, Harkin LF, Copp AJ, Gerrelli D, Clowry GJ, Talbot A, Keogh MJ, Coxhead J, Santibanez-Koref M and Chinnery PF (2016) HDBR Expression: A Unique Resource for Global and Individual Gene Expression Studies during Early Human Brain Development. Front. Neuroanat. 10:86. doi: 10.3389/fnana.2016.00086

Received

29 July 2016

Accepted

12 October 2016

Published

26 October 2016

Volume

10 - 2016

Edited by

James A. Bourne, Australian Regenerative Medicine Institute, Australia

Reviewed by

Guy Elston, Centre for Cognitive Neuroscience, Australia; Jennifer Rodger, University of Western Australia, Australia

Updates

Copyright

*Correspondence: Susan J. Lindsay

†Present Address: Yaobo Xu, Wellcome Trust Sanger Institute, Cambridge, UK

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics