Abstract
The recent advances in sequencing throughput and genome assembly algorithms have established whole-genome shotgun (WGS) assemblies as the cornerstone of the genomic infrastructure for many species. WGS assemblies can be constructed with comparative ease and give a comprehensive representation of the gene space even of large and complex genomes. One major obstacle in utilizing WGS assemblies for important research applications such as gene isolation or comparative genomics has been the lack of chromosomal positioning and contextualization of short sequence contigs. Assigning chromosomal locations to sequence contigs required the construction and integration of genome-wide physical maps and dense genetic linkage maps as well as synteny to model species. Recently, methods to rapidly construct ultra-dense linkage maps encompassing millions of genetic markers from WGS sequencing data of segregating populations have made possible the direct assignment of genetic positions to short sequence contigs. Here, we review recent developments in the integration of WGS assemblies and sequence-based linkage maps, discuss challenges for further improvement of the methodology and outline possible applications building on genetically anchored WGS assemblies.
INTRODUCTION
Next-generation sequencing (NGS) has facilitated the rapid collection of vast amounts of genomic sequence data, enabling whole-genome shotgun (WGS) assemblies in species with huge genomes (Li et al., 2010; Jia et al., 2013; Ling et al., 2013; Nystedt et al., 2013). Compared with approaches based on physical maps, WGS assemblies are rapidly made, are comparatively cheap and represent an easy way to gain a comprehensive view of the gene complement of a species, even for species without prior availability of genomic resources. Nevertheless, de novo sequence assembly from short sequence reads remains a formidable algorithmic challenge requiring large amounts of sequence data and powerful compute resources. A recent comparative benchmarking (Bradnam et al., 2013) of assembly pipelines on real datasets highlighted substantial differences in the performance of different algorithmic approaches. The main limitation of WGS assemblies for downstream applications is their fragmentation (Green, 1997): they often consist of up to millions of short contiguous pieces of sequence (contigs), which may be grouped and partially ordered by long-distance mate-pair reads to form scaffolds.
The primary algorithmic challenge of sequence assembly – and thus the origin of the fragmentation – are repeat elements (Alkan et al., 2011b), whose numerous copies are nearly identical, are difficult to resolve with short NGS reads and thus tend to be assembled into a single collapsed sequence contig. Moreover, contigs representing single-copy regions cannot be unambiguously extended at the border of repetitive elements and terminate there. The lack of contiguity of WGS assemblies is a major impediment to downstream analyses. Sequence-based high-throughput genotyping and its applications such as genome-wide association or population genetic studies rely on the visualization of features [single-nucleotide polymorphisms (SNPs), peaks of summary statistics] along the chromosomes, often applying sliding-windows to aggregate the information of neighboring contigs (Luikart et al., 2003; Schneeberger et al., 2009; Andrews and Luikart, 2014; Ellegren, 2014). Without any notion of order or vicinity of contigs, such approaches are impossible.
The process of assigning chromosomal locations to contigs of an assembly is referred to as anchoring. The ultimate goal of this process is to establish pseudomolecules, single accurately ordered sequence scaffolds for each chromosome with as little intervening gaps as possible. Lacking in completeness – in particular in the repetitive portion of the genome – and contiguity, WGS assemblies of large and complex genomes of flowering plants or mammals have so far not attained the quality of a draft genome (Alkan et al., 2011b; Feuillet et al., 2011). High-quality reference sequences continue to be constructed with the help of physical maps and sequencing single bacterial artificial chromosomes (BACs; Groenen et al., 2012; Amborella Genome Project, 2013). However, this hierarchical shotgun approach entails the laborious and expensive steps of BAC library construction, finger-printing and clone-by-clone sequencing (Ariyadasa and Stein, 2012).
If extensive physical mapping resources are not available (as is the case for many non-model species), reference genomes of related species may serve as proxy to order WGS assemblies, but approaches based on genome collinearity (Mayer et al., 2009) are restricted to genic regions and their accuracy is bounded by the degree of syntenic conservation between related species. Recent translocations or duplications of single genes or larger genomic regions may reduce interspecific collinearity and thus impact the accuracy of synteny-guided assembly ordering (Wicker et al., 2011). This approach is also limited to the gene-space, as intergenic, repetitive sequences evolve very fast and show little conservation even between individuals of a single species (Brunner et al., 2005). It is therefore desirable to have methods at hand that can provide fast and cost-efficient access to an at least partially ordered WGS assembly.
GENETIC ANCHORING OF WGS ASSEMBLIES
For more than a century, genetic mapping has been a universal method to order genomic loci along the chromosomes of sexually reproducing species. Theoretical models of allelic segregation in experimental mapping populations have long been established (Morgan et al., 1922) and various algorithms applying these principles to construct genetic linkage maps from genotypic data have been implemented [reviewed in Cheema and Dicks (2009)]. During the last decades, advances in genetic mapping have been concomitant with the development of molecular marker technologies (Henry, 2012). NGS-based genotyping has recently enabled the simultaneous and near-exhaustive assay of every sequence polymorphism segregating in a mapping population (Huang et al., 2009; Davey et al., 2011). Genotyping-by-sequencing of mapping populations has first been employed in species with high-quality map-based reference sequences such as rice (Huang et al., 2009; Xie et al., 2010) or Drosophila (Andolfatto et al., 2011). This obviated the need for inferring marker order de novo from the genotypic data and enabled the efficient elimination of missing data through a sliding-window approach (Xie et al., 2010).
Because whole genome resequencing is still too expensive to deeply sequence a large number of individuals of species with large genomes, methods have been designed that reduce the genomic complexity either by restriction enzyme digestion (Altshuler et al., 2000; Baird et al., 2008; Elshire et al., 2011) or sequence capture with oligonucleotide baits (Hodges et al., 2007; Bainbridge et al., 2010). Reduced representation sequencing has been applied to anchor a large portion of the small genome (240 Mb) of woodland strawberry (Shulaev et al., 2011), but could assign only a minor fraction of the sequence assembly of the 4 Gb genome of the bread wheat progenitor Aegilops tauschii (Jia et al., 2013) to chromosomal locations.
Recently, two reports (Mascher et al., 2013; Hahn et al., 2014) described computational pipelines that employ genotyping by whole genome sequencing of a genetic mapping population to construct an ultra-dense de novo linkage map of this population and place the assembly contigs of a WGS assembly into the map, producing a genetically anchored WGS assembly. The major computational steps of these procedures are: (i) constructing a WGS assembly from NGS data, (ii) mapping the sequence reads of the population to the assembly and computational genotype calling, (iii) building a genetic linkage map as a framework into which to (iv) integrate the WGS SNPs and assembly contigs harboring them (Figure 1).
FIGURE 1
The POPSEQ method (Mascher et al., 2013) utilizes established software for read mapping (BWA (Li and Durbin, 2009), variant calling [SAMtools (Li, 2011)] and map-making [MSTMap (Wu et al., 2008)]. SNPs detected by whole-genome sequencing of a mapping population are placed into a genetic framework of this same population through a simple nearest neighbor search. POPSEQ was first used to anchor genetically an existing genome assembly of barley (Hordeum vulgare), a monocotyledonous crop plant. The individuals of two mapping populations were sequenced to average onefold whole-genome coverage and after in silico genotyping, SNPs were placed into genetic framework maps of the populations which had been previously constructed from SNP array data (Comadran et al., 2012), through genotyping-by-sequencing (Poland et al., 2012), or were made from the WGS data of the population. The genetic positions of SNPs on WGS contigs were then used to assign chromosomal locations to the contigs of the WGS assembly. Two thirds (1.2 Gbp) of the 1.8 Gbp barley assembly could thus be genetically localized. Although the anchored portion of the assembly included 80% of the predicted gene loci, the assembly itself represented only the low-copy portion of the large (5 Gb) and highly repetitive barley genome (The International Barley Genome Sequencing Consortium, 2012).
A similar method [recombinant population genome construction, RPGC(Hahn et al., 2014)] likewise combines existing tools for sequence-based genotyping [BWA (Li and Durbin, 2009), SAMtools (Li, 2011), GATK (DePristo et al., 2011)] and genetic map construction [MSTMap (Wu et al., 2008)]. An additional feature of RPGC is the detection and correction of assembly errors caused by erroneously collapsing highly similar paralogous sequences. Such collapsed loci show segregation patterns inconsistent with a 1:2:1 distribution of genotypes in an F2 population. The authors evaluated RPGC with simulated sequence data of an F2 population of the worm C. elegans, a model species with a small genome (∼100 Mb). A de novo assembly with ALLPATHS-LG (Gnerre et al., 2011) consisted of only 88 scaffolds and covered 96% of the genome. Alignment to the C. elegans reference genome revealed that all scaffolds were ordered and oriented correctly, indicating that NGS-based sequence assembly and subsequent anchoring may be able to create almost complete and highly accurate sequence assemblies for species with small, repeat-poor genomes.
POPSEQ and RPGC are both targeted towards the construction of a reference sequence for a given species. Nevertheless, the availability of a reference genome does not at all depreciate further de novo assembly efforts. Structural variation is abundant in the genomes of many species (Feuk et al., 2006; Springer et al., 2009; Munoz-Amatriain et al., 2013; Marroni et al., 2014). Because complex events resulting in copy-number or presence absence variation are difficult to disentangle by mapping short NGS reads to a single reference sequence (Medvedev et al., 2009; Alkan et al., 2011a), reference-guided de novo assembly (Schneeberger et al., 2011) has been proposed as a tool to detect large-scale deletions, insertions and inversions. In a recent example, Gao et al. (2013) used sequence data from a segregating population of rice to assemble the genome sequence of one parent and correct errors in the existing assembly of the other parent.
Anchoring sequence scaffolds by population sequencing can also benefit on-going map-based sequencing projects. Although the construction of genetically anchored WGS assemblies is independent of a physical map and associated sequence resources (sequenced BAC clones, BAC end sequences), both can synergistically improve each other. As shown for barley, the sequence and marker resources provided by the assembly can be used to order and anchor the physical map (Ariyadasa et al., 2014) and, vice versa, the information about short-range connectivity obtained from clone overlaps can help further resolve the order of sequence contigs within recombination bins.
APPLICATIONS OF GENETICALLY ORDERED SEQUENCE ASSEMBLIES
The genome sequence of a species is not an end in itself. But a genome constitutes a “research infrastructure” for biology (Olson, 1993), providing a stepping stone to a wide range of studies in basic and applied research that either makes possible or greatly accelerates the achievement of their aims. Many of these applications do not strictly necessitate a finished reference genome, i.e., near-complete pseudomolecules for each chromosome, but they can also be carried out with a partially ordered sequence assembly (possibly supplemented by physical mapping resources) that represents the majority of gene models. Such a partial order can be provided by genetically ordered WGS assemblies, which may function as hubs for gene isolation and empower comparative and evolutionary genomics.
Mapping-by-sequencing is the combined use of bulked segregant analysis and NGS to identify genes that underlie phenotypic traits (Schneeberger and Weigel, 2011). After the initial implementation in Arabidopsis (Schneeberger et al., 2009), similar approaches have been developed in other plant and animal species (Doitsidou et al., 2010; Abe et al., 2012; Leshchiner et al., 2012). As the individuals of the mapping population are not genotyped individually but sequenced together in pools, only the distribution of allele frequencies across pools can be inspected and marker order cannot be determined de novo. Thus, genetic marker positions have to be inferred from an ordered reference sequence (Figure 2). Moreover, QTL mapping using whole-genome (Huang et al., 2009; Gao et al., 2013) or reduced representation resequencing (Baxter et al., 2011; Morris et al., 2013; Liu et al., 2014) of biparental populations or association panels can take advantage of an ordered reference to search identified target intervals for anchored candidate genes.
FIGURE 2
Genomics has been acknowledged as a powerful means to study evolutionary processes across several individuals of a single species (Luikart et al., 2003) and also across species boundaries (Sousa and Hey, 2013) to gain insights into how evolutionary forces such as adaptation to environmental conditions, natural selection, or random genetic drift shape the genomes of individuals and species. These fields have greatly benefited from the “democratization of sequencing” engendered by NGS technology (Stapley et al., 2010; Ekblom and Galindo, 2011). Genomic resources of non-model organisms can now quickly be assembled in order to support specific research aims (Ellegren, 2014). The recent study of Ellegren et al. (2012) used a genetically ordered draft genome sequence to dissect speciation between closely related songbird species. In an agronomic context, the International Oryza Map Alignment Project (Jacquemin et al., 2013) aims at sequencing the genomes of all members of the genus Oryza, i.e., relatives of cultivated rice. Starting from the premise that a single reference genome is not sufficient to assess the natural diversity across an entire genus, this project wants to establish a comprehensive genomic infrastructure to empower studies into the evolutionary dynamics of genome structure, conservation genomics and to assist crop improvement by introgressing beneficial alleles into elite germplasm. This endeavor could probably benefit from population sequencing data to anchor WGS assemblies and physical maps.
CHALLENGES AND LIMITATIONS
The most time-consuming step of anchoring a WGS assembly is the construction of a genetic population. While the sequencing and computational steps can be carried out in less than 6 months (Mascher et al., 2013), population development, in case of recombinant inbred line populations, may involve several rounds of self-fertilization, which can take several years. However, in plants, genetic mapping is routinely performed by researchers in academia and private industries and suitable mapping populations are often readily available. Moreover, plant mapping populations lend themselves very well to sequence-based mapping. Populations are generally started from highly homozygous genotypes and advanced recombinant inbred or doubled haploid progeny lines are nearly or completely homozygous, respectively. By contrast, F2 generations involve only one round of selfing after the initial cross. However, half of the genome of F2 individuals is expected to be heterozygous, requiring deeper sequence coverage for reliable genotyping. Even in obligate outcrossers, linkage maps can be made from crosses between heterozygous parents (Grattapaglia and Sederoff, 1994). Although controlled crosses cannot be made and the progeny of a single pair of parents is limited in number, genetic mapping is anything but impossible in animals. Linkage analysis in families of siblings from a cross between heterozygous parents is more complicated than in the progeny of homozygous lines (Maliepaard et al., 1997). Markers differ in the number of alleles and the number of heterozygous parents, and it can be impossible to determine the linkage phase of a marker, i.e., from which grandparent it was inherited. High-density linkage maps of the human genome have been constructed from multi-generation pedigrees (Dib et al., 1996). Similar methods based on three-generation pedigrees have been applied in other mammalian species such as macaque monkeys (Rogers et al., 2006) and domestic cats (Menotti-Raymond et al., 1999). Moreover, RIL populations have been created by mating of full-siblings in the laboratory animals mouse (Williams et al., 2001), rat (Pravenec et al., 1996), and fruit fly (Nuzhdin et al., 1997). If a robust genetic framework map can be computed, whole genome sequencing of the pedigree should allow populating this framework with additional markers and WGS sequence contigs. The heterozygosity of natural pedigrees may necessitate deeper sequencing to reliably score heterozygotes. Recent studies found that genotype calling from low- or medium-coverage (<15x) data often results in calling heterozygotes as homozygotes and can bias downstream analyses (Kim et al., 2011; Crawford and Lazzaro, 2012). A sliding-windows approach that aggregates sequence information across multiple SNP positions may help mitigate the effects of genotyping errors caused by low read depth.
In any species, linkage mapping has the inherent limitation that the maximally achievable resolution is determined by the recombination landscape, or more specifically, the ratio between physical and genetic distance along the genome. In grasses, for example, recombination events mainly occur in distal regions, whereas large peri-centromeric intervals are almost devoid of cross-overs. These so-called genetic centromeres correspond to a single large bin in a genetic map, which can only be resolved with extremely large mapping populations or possibly through alternative approaches such as physical mapping (van Oeveren et al., 2011), optical mapping (Dong et al., 2013), or methods based on chromosomal conformation capture (Lieberman-Aiden et al., 2009; Burton et al., 2013).
In contrast to these intrinsic difficulties given by biological facts, algorithmic parameters of the anchoring process can be subject to directed improvement. The major computational tasks of assembly anchoring are de novo assembly, read mapping, variant calling and linkage map construction. One of the major determinants of anchoring efficiency is assembly contiguity. The longer a sequence contig is, the more likely it is that at least one sequence polymorphism can be detected to anchor it. Furthermore, longer contigs alleviate the problem of missing data. Even though the majority of individuals have missing genotype calls for single SNPs as a consequence of shallow-coverage sequencing (Huang et al., 2009; Mascher et al., 2013), aggregating genotypic information across all SNPs on a single contig results in consensus genotype calls with little or no missing data.
In the approaches of Mascher et al. (2013) and Hahn et al. (2014), read mapping and variant calling are performed with standard tools that are routinely used in large-scale resequencing projects (1000 Genomes Project Consortium et al., 2012; Tennessen et al., 2012) and will likely scale with the growing amount of raw data as population size and sequencing depth increase. By contrast, the majority of genetic mapping programs are still tailored to datasets encompassing only a few 1000 markers. The most commonly used tool to compute linkage maps form larger marker sets is MSTMap (Wu et al., 2008), for which excessive runtimes have been reported when marker order exceeds ∼100,000 (Howe et al., 2013). As the number of recombination bins in small biparental populations is limited, it can be envisaged to cluster markers prior to map-making based on their segregation patterns to obtain a smaller, yet fully informative set of framework markers. Moreover, focusing on a small number of high-confidence SNP loci may avoid the common problem of map-inflation, which is often caused by spurious cross-over events introduced by genotyping errors (Cheema and Dicks, 2009).
Moreover, valuable insights into the choice of parameters, the overall accuracy of the methods and the interplay of sequencing depth, population size, and final mapping resolution may be gained by performing de novo assembly and anchoring on real data gathered from species with existing high-quality reference genomes to be used as a gold standard for benchmarking.
CONCLUSION
The interest in the genome sequencing of non-model species (Ellegren, 2014) or economically important species with humongous genomes (Brenchley et al., 2012; Nystedt et al., 2013) has increased recently. Genome sequencing and assembly efforts and novel algorithmic development can be expected to intensify in the years to come, as genome sequencing of thousands of animal and plant species has been proposed (Genome 10K Community of Scientists, 2009; Johnson et al., 2012). We conclude with reiterating the advice of Gao et al. (2013) that each genome assembly project should, if at all possible, obtain WGS data from at least one segregating population. Short read assembly will remain a central part of any genome project as long as advances in sequencing technology will not make possible chromosome-sized sequence scaffolds. In the meantime, methods such as genetic anchoring will always be necessary to enhance the utility of fragmented WGS assemblies.
Statements
Acknowledgments
Barley genome sequencing in the lab of Nils Stein is supported by the German Federal Ministry of Research and Education (BMBF) in frame of the TRITEX project (FKZ 0315954).
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
References
1
1000 Genomes Project ConsortiumAbecasisG. R.AutonA.BrooksL. D.DepristoM. A.DurbinR. M.et al (2012). An integrated map of genetic variation from 1,092 human genomes.Nature49156–65. 10.1038/nature11632
2
AbeA.KosugiS.YoshidaK.NatsumeS.TakagiH.KanzakiH.et al (2012). Genome sequencing reveals agronomically important loci in rice using MutMap.Nat. Biotechnol.30174–178. 10.1038/nbt.2095
3
AlkanC.CoeB. P.EichlerE. E. (2011a). Genome structural variation discovery and genotyping.Nat. Rev. Genet.12363–376. 10.1038/nrg2958
4
AlkanC.SajjadianS.EichlerE. E. (2011b). Limitations of next-generation genome sequence assembly.Nat. Methods861–65. 10.1038/nmeth.1527
5
AltshulerD.PollaraV. J.CowlesC. R.Van EttenW. J.BaldwinJ.LintonL.et al (2000). An SNP map of the human genome generated by reduced representation shotgun sequencing.Nature407513–516. 10.1038/35035083
6
Amborella Genome Project. (2013). The Amborella genome and the evolution of flowering plants.Science342:1241089. 10.1126/science.1241089
7
AndolfattoP.DavisonD.ErezyilmazD.HuT. T.MastJ.Sunayama-MoritaT.et al (2011). Multiplexed shotgun genotyping for rapid and efficient genetic mapping.Genome Res.21610–617. 10.1101/gr.115402.110
8
AndrewsK. R.LuikartG. (2014). Recent novel approaches for population genomics data analysis.Mol. Ecol.231661–1667. 10.1111/mec.12686
9
AriyadasaR.MascherM.NussbaumerT.SchulteD.FrenkelZ.PoursarebaniN.et al (2014). A sequence-ready physical map of barley anchored genetically by two million single-nucleotide polymorphisms.Plant Physiol.164412–423. 10.1104/pp.113.228213
10
AriyadasaR.SteinN. (2012). Advances in BAC-based physical mapping and map integration strategies in plants.J. Biomed. Biotechnol.2012:184854. 10.1155/2012/184854
11
BainbridgeM. N.WangM.BurgessD. L.KovarC.RodeschM. J.D′ascenzoM.et al (2010). Whole exome capture in solution with 3 Gbp of data.Genome Biol.11 R62. 10.1186/gb-2010-11-6-r62
12
BairdN. A.EtterP. D.AtwoodT. S.CurreyM. C.ShiverA. L.LewisZ. A.et al (2008). Rapid SNP discovery and genetic mapping using sequenced RAD markers.PLoS ONE3:e3376. 10.1371/journal.pone.0003376
13
BaxterS. W.DaveyJ. W.JohnstonJ. S.SheltonA. M.HeckelD. G.JigginsC. D.et al (2011). Linkage mapping and comparative genomics using next-generation RAD sequencing of a non-model organism.PLoS ONE6:e19315. 10.1371/journal.pone.0019315
14
BradnamK. R.FassJ. N.AlexandrovA.BaranayP.BechnerM.BirolI.et al (2013). Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species.Gigascience2:10. 10.1186/2047-217X-2-10
15
BrenchleyR.SpannaglM.PfeiferM.BarkerG. L.D’AmoreR.AllenA. M.et al (2012). Analysis of the bread wheat genome using whole-genome shotgun sequencing.Nature491705–710. 10.1038/nature11650
16
BrunnerS.FenglerK.MorganteM.TingeyS.RafalskiA. (2005). Evolution of DNA sequence nonhomologies among maize inbreds.Plant Cell17343–360. 10.1105/tpc.104.025627
17
BurtonJ. N.AdeyA.PatwardhanR. P.QiuR.KitzmanJ. O.ShendureJ. (2013). Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions.Nat. Biotechnol.311119–1125. 10.1038/nbt.2727
18
CarolloV.MatthewsD. E.LazoG. R.BlakeT. K.HummelD. D.LuiN.et al (2005). GrainGenes 2.0. an improved resource for the small-grains community.Plant Physiol.139643–651. 10.1104/pp.105.064485
19
CheemaJ.DicksJ. (2009). Computational approaches and software tools for genetic linkage map estimation in plants.Brief. Bioinform.10595–608. 10.1093/bib/bbp045
20
ComadranJ.KilianB.RussellJ.RamsayL.SteinN.GanalM.et al (2012). Natural variation in a homolog of Antirrhinum CENTRORADIALIS contributed to spring growth habit and environmental adaptation in cultivated barley.Nat. Genet.441388–1392. 10.1038/ng.2447
21
CrawfordJ. E.LazzaroB. P. (2012). Assessing the accuracy and power of population genetic inference from low-pass next-generation sequencing data.Front. Genet.3:66. 10.3389/fgene.2012.00066
22
DaveyJ. W.HohenloheP. A.EtterP. D.BooneJ. Q.CatchenJ. M.BlaxterM. L. (2011). Genome-wide genetic marker discovery and genotyping using next-generation sequencing.Nat. Rev. Genet.12499–510. 10.1038/nrg3012
23
DePristoM. A.BanksE.PoplinR.GarimellaK. V.MaguireJ. R.HartlC.et al (2011). A framework for variation discovery and genotyping using next-generation DNA sequencing data.Nat. Genet.43491–498. 10.1038/ng.806
24
DibC.FaureS.FizamesC.SamsonD.DrouotN.VignalA.et al (1996). A comprehensive genetic map of the human genome based on 5,264 microsatellites.Nature380152–154. 10.1038/380152a0
25
DoitsidouM.PooleR. J.SarinS.BigelowH.HobertO. (2010). C. elegans mutant identification with a one-step whole-genome-sequencing and SNP mapping strategy.PLoS ONE5:e15435. 10.1371/journal.pone.0015435
26
DongY.XieM.JiangY.XiaoN.DuX.ZhangW.et al (2013). Sequencing and automated whole-genome optical mapping of the genome of a domestic goat (Capra hircus).Nat. Biotechnol.31135–141. 10.1038/nbt.2478
27
EkblomR.GalindoJ. (2011). Applications of next generation sequencing in molecular ecology of non-model organisms.Heredity (Edinb) 1071–15. 10.1038/hdy.2010.152
28
EllegrenH. (2014). Genome sequencing and population genomics in non-model organisms.Trends Ecol. Evol. (Amst.) 2951–63. 10.1016/j.tree.2013.09.008
29
EllegrenH.SmedsL.BurriR.OlasonP. I.BackstromN.KawakamiT.et al (2012). The genomic landscape of species divergence in Ficedula flycatchers.Nature491756–760. 10.1038/nature11584
30
ElshireR. J.GlaubitzJ. C.SunQ.PolandJ. A.KawamotoK.BucklerE. S.et al (2011). A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species.PLoS ONE6:e19379. 10.1371/journal.pone.0019379
31
FeuilletC.LeachJ. E.RogersJ.SchnableP. S.EversoleK. (2011). Crop genome sequencing: lessons and rationales.Trends Plant Sci.1677–88. 10.1016/j.tplants.2010.10.005
32
FeukL.CarsonA. R.SchererS. W. (2006). Structural variation in the human genome.Nat. Rev. Genet.785–97. 10.1038/nrg1767
33
GaoZ. Y.ZhaoS. C.HeW. M.GuoL. B.PengY. L.WangJ. J.et al (2013). Dissecting yield-associated loci in super hybrid rice by resequencing recombinant inbred lines and improving parental genome sequences.Proc. Natl. Acad. Sci. U.S.A.11014492–14497. 10.1073/pnas.1306579110
34
Genome 10K Community of Scientists. (2009). Genome 10K: a proposal to obtain whole-genome sequence for 10,000 vertebrate species.J. Hered.100659–674. 10.1093/jhered/esp086
35
GnerreS.MaccallumI.PrzybylskiD.RibeiroF. J.BurtonJ. N.WalkerB. J.et al (2011). High-quality draft assemblies of mammalian genomes from massively parallel sequence data.Proc. Natl. Acad. Sci. U.S.A.1081513–1518. 10.1073/pnas.1017351108
36
GrattapagliaD.SederoffR. (1994). Genetic linkage maps of Eucalyptus grandis and Eucalyptus urophylla using a pseudo-testcross: mapping strategy and RAPD markers.Genetics1371121–1137.
37
GreenP. (1997). Against a whole-genome shotgun.Genome Res.7410–417. 10.1101/gr.7.5.410
38
GroenenM. A.ArchibaldA. L.UenishiH.TuggleC. K.TakeuchiY.RothschildM. F.et al (2012). Analyses of pig genomes provide insight into porcine demography and evolution.Nature491393–398. 10.1038/nature11622
39
HahnM. W.ZhangS. V.MoyleL. C. (2014). Sequencing, assembling, and correcting draft genomes using recombinant populations.G3 (Bethesda) 4669–679. 10.1534/g3.114.010264
40
HenryR. J. (2012). Molecular Markers in Plants. Hoboken, NJ: John Wiley & Sons.
41
HodgesE.XuanZ.BalijaV.KramerM.MollaM. N.SmithS. W.et al (2007). Genome-wide in situ exon capture for selective resequencing.Nat. Genet.391522–1527. 10.1038/ng.2007.42
42
HoustonK.MckimS. M.ComadranJ.BonarN.DrukaI.UzrekN.et al (2013). Variation in the interaction between alleles of HvAPETALA2 and microRNA172 determines the density of grains on the barley inflorescence.Proc. Natl. Acad. Sci. U.S.A.11016675–16680. 10.1073/pnas.1311681110
43
HoweK.ClarkM. D.TorrojaC. F.TorranceJ.BerthelotC.MuffatoM.et al (2013). The zebrafish reference genome sequence and its relationship to the human genome.Nature496498–503. 10.1038/nature12111
44
HuangX.FengQ.QianQ.ZhaoQ.WangL.WangA.et al (2009). High-throughput genotyping by whole-genome resequencing.Genome Res.191068–1076. 10.1101/gr.089516.108
45
JacqueminJ.BhatiaD.SinghK.WingR. A. (2013). The International Oryza Map Alignment Project: development of a genus-wide comparative genomics platform to help solve the 9 billion-people question.Curr. Opin. Plant Biol.16147–156. 10.1016/j.pbi.2013.02.014
46
JiaJ.ZhaoS.KongX.LiY.ZhaoG.HeW.et al (2013). Aegilops tauschii draft genome sequence reveals a gene repertoire for wheat adaptation.Nature49691–95. 10.1038/nature12028
47
JohnsonM. T.CarpenterE. J.TianZ.BruskiewichR.BurrisJ. N.CarriganC. T.et al (2012). Evaluating methods for isolating total RNA and predicting the success of sequencing phylogenetically diverse plant transcriptomes.PLoS ONE7:e50226. 10.1371/journal.pone.0050226
48
KimS. Y.LohmuellerK. E.AlbrechtsenA.LiY. R.KorneliussenT.TianG.et al (2011). Estimation of allele frequency and association mapping using next-generation sequencing data.BMC Bioinformatics12:231. 10.1186/1471-2105-12-231
49
LeshchinerI.AlexaK.KelseyP.AdzhubeiI.Austin-TseC. A.CooneyJ. D.et al (2012). Mutation mapping and identification by whole-genome sequencing.Genome Res.221541–1548. 10.1101/gr.135541.111
50
LiH. (2011). A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.Bioinformatics272987–2993. 10.1093/bioinformatics/btr509
51
LiH.DurbinR. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform.Bioinformatics251754–1760. 10.1093/bioinformatics/btp324
52
LiR.ZhuH.RuanJ.QianW.FangX.ShiZ.et al (2010). De novo assembly of human genomes with massively parallel short read sequencing.Genome Res.20265–272. 10.1101/gr.097261.109
53
Lieberman-AidenE.Van BerkumN. L.WilliamsL.ImakaevM.RagoczyT.TellingA.et al (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome.Science326289–293. 10.1126/science.1181369
54
LingH. Q.ZhaoS.LiuD.WangJ.SunH.ZhangC.et al (2013). Draft genome of the wheat A-genome progenitor Triticum urartu.Nature49687–90. 10.1038/nature11997
55
LiuH.BayerM.DrukaA.RussellJ. R.HackettC. A.PolandJ.et al (2014). An evaluation of genotyping by sequencing (GBS) to map the Breviaristatum-e (ari-e) locus in cultivated barley.BMC Genomics15:104. 10.1186/1471-2164-15-104
56
LuikartG.EnglandP. R.TallmonD.JordanS.TaberletP. (2003). The power and promise of population genomics: from genotyping to genome typing.Nat. Rev. Genet.4981–994. 10.1038/nrg1226
57
MaliepaardC.JansenJ.Van OoijenJ. (1997). Linkage analysis in a full-sib family of an outbreeding plant species: overview and consequences for applications.Genet. Res.70237–250. 10.1017/S0016672397003005
58
MarroniF.PinosioS.MorganteM. (2014). Structural variation and genome complexity: is dispensable really dispensable?Curr. Opin. Plant Biol. 18C, 31–36. 10.1016/j.pbi.2014.01.003
59
MascherM.MuehlbauerG. J.RokhsarD. S.ChapmanJ.SchmutzJ.BarryK.et al (2013). Anchoring and ordering NGS contig assemblies by population sequencing (POPSEQ).Plant J.76718–727. 10.1111/tpj.12319
60
MayerK. F.TaudienS.MartisM.SimkovaH.SuchankovaP.GundlachH.et al (2009). Gene content and virtual gene order of barley chromosome 1H.Plant Physiol.151496–505. 10.1104/pp.109.142612
61
MedvedevP.StanciuM.BrudnoM. (2009). Computational methods for discovering structural variation with next-generation sequencing.Nat. Methods6S13–S20. 10.1038/nmeth.1374
62
Menotti-RaymondM.DavidV. A.LyonsL. A.SchafferA. A.TomlinJ. F.HuttonM. K.et al (1999). A genetic linkage map of microsatellites in the domestic cat (Felis catus).Genomics579–23. 10.1006/geno.1999.5743
63
MorganT. H.SturtevantA. H.MullerH. J.BridgesC. B. (1922). The Mechanism of Mendelian Heredity. New York, NY: Henry Holt and Company.
64
MorrisG. P.RamuP.DeshpandeS. P.HashC. T.ShahT.UpadhyayaH. D.et al (2013). Population genomic and genome-wide association studies of agroclimatic traits in sorghum.Proc. Natl. Acad. Sci. U.S.A.110453–458. 10.1073/pnas.1215985110
65
Munoz-AmatriainM.EichtenS. R.WickerT.RichmondT. A.MascherM.SteuernagelB.et al (2013). Distribution, functional impact, and origin mechanisms of copy number variation in the barley genome.Genome Biol.14 R58. 10.1186/gb-2013-14-6-r58
66
NuzhdinS. V.PasyukovaE. G.DildaC. L.ZengZ. B.MackayT. F. (1997). Sex-specific quantitative trait loci affecting longevity in Drosophila melanogaster.Proc. Natl. Acad. Sci. U.S.A.949734–9739. 10.1073/pnas.94.18.9734
67
NystedtB.StreetN. R.WetterbomA.ZuccoloA.LinY. C.ScofieldD. G.et al (2013). The Norway spruce genome sequence and conifer genome evolution.Nature497579–584. 10.1038/nature12211
68
OlsonM. V. (1993). The human genome project.Proc. Natl. Acad. Sci. U.S.A.904338–4344. 10.1073/pnas.90.10.4338
69
PolandJ. A.BrownP. J.SorrellsM. E.JanninkJ. L. (2012). Development of high-density genetic maps for barley and wheat using a novel two-enzyme genotyping-by-sequencing approach.PLoS ONE7:e32253. 10.1371/journal.pone.0032253
70
PravenecM.GauguierD.SchottJ. J.BuardJ.KrenV.BilaV.et al (1996). A genetic linkage map of the rat derived from recombinant inbred strains.Mamm. Genome7117–127. 10.1007/s003359900031
71
RogersJ.GarciaR.ShelledyW.KaplanJ.AryaA.JohnsonZ.et al (2006). An initial genetic linkage map of the rhesus macaque (Macaca mulatta) genome using human microsatellite loci.Genomics8730–38. 10.1016/j.ygeno.2005.10.004
72
SchneebergerK.OssowskiS.LanzC.JuulT.PetersenA. H.NielsenK. L.et al (2009). SHOREmap: simultaneous mapping and mutation identification by deep sequencing.Nat. Methods6550–551. 10.1038/nmeth0809-550
73
SchneebergerK.OssowskiS.OttF.KleinJ. D.WangX.LanzC.et al (2011). Reference-guided assembly of four diverse Arabidopsis thaliana genomes.Proc. Natl. Acad. Sci. U.S.A.10810249–10254. 10.1073/pnas.1107739108
74
SchneebergerK.WeigelD. (2011). Fast-forward genetics enabled by new sequencing technologies.Trends Plant Sci.16282–288. 10.1016/j.tplants.2011.02.006
75
ShulaevV.SargentD. J.CrowhurstR. N.MocklerT. C.FolkertsO.DelcherA. L.et al (2011). The genome of woodland strawberry (Fragaria vesca).Nat. Genet.43109–116. 10.1038/ng.740
76
SousaV.HeyJ. (2013). Understanding the origin of species with genome-scale data: modelling gene flow.Nat. Rev. Genet.14404–414. 10.1038/nrg3446
77
SpringerN. M.YingK.FuY.JiT.YehC. T.JiaY.et al (2009). Maize inbreds exhibit high levels of copy number variation (CNV) and presence/absence variation (PAV) in genome content.PLoS Genet.5:e1000734. 10.1371/journal.pgen.100073
78
StapleyJ.RegerJ.FeulnerP. G.SmadjaC.GalindoJ.EkblomR.et al (2010). Adaptation genomics: the next generation.Trends Ecol. Evol.25705–712. 10.1016/j.tree.2010.09.002
79
TennessenJ. A.BighamA. W.O’ConnorT. D.FuW.KennyE. E.GravelS.et al (2012). Evolution and functional impact of rare coding variation from deep sequencing of human exomes.Science33764–69. 10.1126/science.1219240
80
The International Barley Genome Sequencing Consortium. (2012). A physical, genetic and functional sequence assembly of the barley genome.Nature491711–716. 10.1038/nature11543
81
van OeverenJ.De RuiterM.JesseT.Van Der PoelH.TangJ.YalcinF.et al (2011). Sequence-based physical mapping of complex genomes by whole genome profiling.Genome Res.21618–625. 10.1101/gr.112094.110
82
WickerT.MayerK. F. X.GundlachH.MartisM.SteuernagelB.ScholzU.et al (2011). Frequent gene movement and pseudogene evolution is common to the large and complex genomes of wheat, barley, and their relatives.Plant Cell231706–1718. 10.1105/tpc.111.086629
83
WilliamsR. W.GuJ.QiS.LuL. (2001). The genetic structure of recombinant inbred mice: high-resolution consensus maps for complex trait analysis.Genome Biol.2:RESEARCH0046.
84
WuY.BhatP. R.CloseT. J.LonardiS. (2008). Efficient and accurate construction of genetic linkage maps from the minimum spanning tree of a graph.PLoS Genet.4:e1000212. 10.1371/journal.pgen.1000212
85
XieW.FengQ.YuH.HuangX.ZhaoQ.XingY.et al (2010). Parent-independent genotyping for constructing an ultrahigh-density linkage map based on population sequencing.Proc. Natl. Acad. Sci. U.S.A.10710578–10583.10.1073/pnas.1005931107
Summary
Keywords
next-generation sequencing, whole-genome shotgun assembly, assembly anchoring, genetic mapping, genotyping-by-sequencing, single-nucleotide polymorphisms, mapping populations
Citation
Mascher M and Stein N (2014) Genetic anchoring of whole-genome shotgun assemblies. Front. Genet. 5:208. doi: 10.3389/fgene.2014.00208
Received
16 May 2014
Accepted
19 June 2014
Published
07 July 2014
Volume
5 - 2014
Edited by
Yih-Horng Shiao, US Patent Trademark Office, USA
Reviewed by
Beat Keller, University of Zurich, Switzerland; Martien Groenen, Wageningen University, Netherlands
Copyright
© 2014 Mascher and Stein.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Martin Mascher, Leibniz Institute of Plant Genetics and Crop Plant Research, Corrensstraße 3, 06466 Stadt Seeland, OT Gatersleben, Germany e-mail: mascher@ipk-gatersleben.de
This article was submitted to Genomic Assay Technology, a section of the journal Frontiers in Genetics.
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.