Genomic, Transcriptomic and Epigenomic Tools to Study the Domestication of Plants and Animals: A Field Guide for Beginners

Barrera-Redondo, Josué; Piñero, Daniel; Eguiarte, Luis E.

doi:10.3389/fgene.2020.00742

REVIEW article

Front. Genet., 15 July 2020

Sec. Evolutionary, Population, and Conservation Genetics

Volume 11 - 2020 | https://doi.org/10.3389/fgene.2020.00742

Genomic, Transcriptomic and Epigenomic Tools to Study the Domestication of Plants and Animals: A Field Guide for Beginners

Departamento de Ecología Evolutiva, Instituto de Ecología, Universidad Nacional Autónoma de México, Mexico City, Mexico

Article metrics

View details

Citations

16,4k

Views

3,4k

Downloads

Abstract

In the last decade, genomics and the related fields of transcriptomics and epigenomics have revolutionized the study of the domestication process in plants and animals, leading to new discoveries and new unresolved questions. Given that some domesticated taxa have been more studied than others, the extent of genomic data can range from vast to nonexistent, depending on the domesticated taxon of interest. This review is meant as a rough guide for students and academics that want to start a domestication research project using modern genomic tools, as well as for researchers already conducting domestication studies that are interested in following a genomic approach and looking for alternate strategies (cheaper or more efficient) and future directions. We summarize the theoretical and technical background needed to carry out domestication genomics, starting from the acquisition of a reference genome and genome assembly, to the sampling design for population genomics, paleogenomics, transcriptomics, epigenomics and experimental validation of domestication-related genes. We also describe some examples of the aforementioned approaches and the relevant discoveries they made to understand the domestication of the studied taxa.

Introduction

The modern study of domestication of plants and animals is multidisciplinary, and relevant contributions come from botany, zoology, archeology, genetics, ethnobiology, biogeography, and linguistics (Larson et al., 2014). Modern domestication studies seek to understand the dates of domestication, the places where domestication started and number of times that domestication took place, as well as the details of the evolutionary and ecological forces that led to the divergence between the domesticated taxa and their wild relatives and ancestors (Zeder, 2006; Larson et al., 2014).

Given that domestication is an evolutionary process, genetics emerged as a powerful tool to understand the domestication of plants and animals, revealing the demographic history of the domesticated taxa and the genetic variants that underlie their domesticated phenotypes (Zeder et al., 2006; Gepts, 2014). The advent of high-throughput sequencing technologies sparked the use of genomic studies to understand the domestication of crops and animals in a much deeper level than previously imagined, as researchers can now pinpoint the genetic changes that allowed domestication to happen (Ross-Ibarra et al., 2007; Gepts, 2014).

Why and How to Use a Genomic Approach in Domestication Studies? Top-Down and Bottom-Up Approaches for the Study of Domestication

In genetics, we refer to top or up when referring to a specific phenotype, while we refer to bottom or down when referring to the underlying genotype responsible for that trait. Thus, top-down approaches start by studying a particular phenotype and searching for its genetic basis. Huge advances in the genetic study of domestication traits have been made using classic top-down approaches (e.g., Sax, 1923; Paterson et al., 1988; Doebley and Stec, 1991; Doebley et al., 1995), which are performed by analyzing the phenotypic traits of interest between wild and domesticated taxa, and then finding the genetic variant or variants that correlate with the phenotypic traits through the mapping of quantitative trait loci and linkage disequilibrium (Ross-Ibarra et al., 2007; Kantar et al., 2017). These top-down approaches are precise in finding causal variants involved in the evolution of specific traits, but usually they are very labor-intensive and are biased towards a priori selected phenotypes to be compared between wild and domesticated taxa (Ross-Ibarra et al., 2007; Kantar et al., 2017).

In contrast to top-down approaches, bottom-up approaches start by analyzing the genetic variation within genomes in order to detect potential signals of selection related to the domestication process and finally associate such evolutionary signals to important loci and domestication phenotypes (Ross-Ibarra et al., 2007; Kantar et al., 2017). In the last decade, high-throughput sequencing technologies allowed us to analyze entire genomes of one or several individuals of domesticated taxa, and to compare them to different varieties or to their wild relatives (e.g., Hufford et al., 2012; Yang et al., 2012; Li et al., 2013; Wang et al., 2019; Zeng et al., 2019).

Bottom-up approaches do not need an a priori phenotypic target, enabling a genome-wide search of domestication-related loci without previous background of possible candidates, revealing important traits that can hardly be studied using a top-down approach (Ross-Ibarra et al., 2007; Kantar et al., 2017). Nevertheless, the results of bottom-up approaches can be limited by the sampling scheme, the density of genetic markers, and the detection of false positives (Tiffin and Ross-Ibarra, 2014), so these genomic approaches have to be properly and carefully designed in order to obtain satisfying results (De Mita et al., 2013).

Genomic data facilitated the widespread and reliable use of bottom-up approaches to study plant and animal domestication, but top-down strategies were also aided by genomics, allowing a more efficient search of genotype-phenotype correlations through genome-wide association studies (GWAS; Wang G.-D. et al., 2014), which can be defined as experimental designs that are used to detect the association between genetic variation in a population and phenotypical traits of interest (Visscher et al., 2017).

Genome-wide genetic markers allows to differentiate between global and local evolutionary signals occurring throughout the genome (Diao and Chen, 2012), discerning the signals of selection during domestication (Vitti et al., 2013) from other fine-scale signals of demographic events that occurred during the domestication process (Meyer and Purugganan, 2013; Guerra García and Piñero, 2017).

The use of modern genomic tools is not limited to population genetics, as other interesting approaches can reveal important aspects of the domestication process. For instance, one can analyze changes in the transcriptional activity of genes related to domestication (Hekman et al., 2015), demonstrate the phenotypic effects of certain alleles through the use of genomic editing tools (Zhou J. et al., 2019), search for epigenetic patterns that changed between domesticated and wild taxa (Janowitz Koch et al., 2016) or analyze the genetic makeup of archeological samples (Irving-Pease et al., 2019).

This review describes the necessary steps and data to start a genomic research project towards understanding domestication, the questions that can be approached using genomic data and the main results obtained from previous studies using these methods (Figure 1).

FIGURE 1

Whole-Genome Assembly and Reference Genomes

Whole-genome assembly is one of the first steps in modern domestication studies, since it generates a reference genome that is useful for downstream analyses. Whole-genome assembly projects require the use of high-throughput sequencing technologies such as Illumina (e.g., Sun et al., 2017), PacBio (e.g., Badouin et al., 2017; VanBuren et al., 2018), Oxford Nanopore (e.g., Belser et al., 2018) or a combination of these (e.g., Bickhart et al., 2017; Zhou Y. et al., 2019) to sequence the genome of interest of a single individual. Before starting a genome assembly project, a rough estimate of the haploid genome size must be known as well as the ploidy of the organism, since the assembly difficulty and sequencing cost are determined by both factors (Sims et al., 2014). In order to successfully assemble eukaryotic genomes, where repetitive elements usually comprise a significant portion of its content [ranging from 3% in tiny genomes such as Utricularia gibba (Ibarra-Laclette et al., 2013) up to 65.5% in huge genomes such as Ambystoma mexicanum (Nowoshilow et al., 2018)], it is necessary to generate sequencing libraries with large insert sizes – called mate-pair libraries – or use long-read sequencing technologies such as PacBio or Oxford Nanopore (Levy and Myers, 2016; Sohn and Nam, 2016). Additionally, the use of chromosome conformation capture (Mascher et al., 2017), optical mapping (Dong et al., 2013) or linkage maps obtained from crosses (Fierst, 2015) will help achieve chromosome-level assemblies that are highly desirable to adequately assess haplotypes, linkage disequilibrium, putative genomic rearrangements and the genomic location of candidate loci (Sohn and Nam, 2016; see Table 1).

TABLE 1

Topic	Citation	Usefulness/importance
Genome assembly and reference genomes	Sohn and Nam, 2016	In-depth review on genome assembly. Includes compelling explanations behind the genome assembly algorithms and an extensive list of genome assembly strategies.
	Yandell and Ence, 2012	In-depth review on eukaryotic genome annotation, a description of the available tools to predict genes and best practices when predicting genes.
Sequencing strategies	Meirmans, 2015	Classic review concerning common pitfalls that should be avoided in a population genomic study. A compulsory review for any newcomer to population genomics.
	Dorant et al., 2019	Empirical study that compares the efficiency of Pool-seq, RADseq and Rapture to detect weak signals of genetic structure in lobsters.
	Inbar et al., 2020	Empirical study that compares the efficiency of whole-genome sequencing, Pool-seq and RADseq for GWAS in ants.
Pan-genomics	Golicz et al., 2016a	In-depth review about pan-genomics in plant species, its advantages over the use of reference genomes, a guide on how to generate pan-genomes and the importance of studying structural variants. The article is dedicated to plants, but the rationale and methods can also be applied to other eukaryotes.
	Khan et al., 2020	Opinion article detailing the relevance of pan-genomes as a necessary next step from reference genomes. The authors also highlight the importance of including wild taxa into pan-genomics and propose the idea of genus-level super-pan-genomes.
	Gao et al., 2019	Landmark study of the tomato pan-genome. The authors sequenced 725 accessions from the domesticated tomato and its wild relatives. They found 4,873 additional genes, including several well-characterized genes that were absent from the reference-genome. They also evaluated the presence-absence variants between the wild and domesticated tomatoes, which were enriched in disease-resistance genes.
Demographic analyses	Linck and Battey, 2019	Research study focused on the effects of minor allele-frequency filters to detect genetic structure in populations. Gives a good explanation on the rationale behind the clustering-based methods to detect structure.
	Mather et al., 2020	Review dedicated to the theoretical background and technical requirements of PSMS and MSMC to infer changes in effective population sizes and coalescent times.
	Gerbault et al., 2014	Excellent review on how to use Bayesian approaches to test different demographic models of domestication.
	Frantz et al., 2015	Landmark study on pig domestication. The authors make use of Approximate Bayesian computation to compare domestication scenarios, they use clustering-based methods to detect genetic structure and used a graph-based method to infer the genetic relationship between pig and wild boar populations.
Selection scans	Vitti et al., 2013	Good review focused on explaining the rationale behind many of the bottom-up tests to detect selection and the genomic signals they are sensitive to.
	De Mita et al., 2013; Lotterhos and Whitlock, 2015	Classic simulation-based studies that compare different scenarios to evaluate the best sampling strategies and the most powerful methods to detect selection throughout the genome, according to the reproductive nature of the organism under study.
	Gibson, 2018	A primer dedicated to understanding the principles behind GWAS and its ability to detect polygenic effects on quantitative traits.
	Hufford et al., 2012	A landmark paper that illustrates how to perform genome scans to detect domestication-related loci in domesticated taxa, and the importance of these loci for crop improvement. The paper studies the domestication of maize, but a similar study design can be applied to domesticated animals.
Paleogenomics	Irving-Pease et al., 2019	Exhaustive book chapter dedicated to the study of ancient DNA to understand domestication.
	Allaby et al., 2019	Research study that casts into doubt the long-lasting idea that domestication processes lead to strong population bottlenecks by re-analyzing data based on ancient DNA samples.
	Daly et al., 2018	Remarkable study that sequenced and analyzed 83 mitochondrial genomes and 51 nuclear genomes from ancient goat samples. The authors found signals of ancient introgression events, as well as ancient selective signals related to several traits that are shared with modern goats.
Transcriptomics	Fang and Cui, 2011	General guideline on how to adequately design an RNA-seq experiment to avoid technical mistakes and generate meaningful results.
	Yang and Kim, 2015	General guideline on how to analyze RNA-seq data to assess differential expression.
	Hekman et al., 2015	In-depth review dedicated to study the domestication process through transcriptomics, including methodological strategies and challenges.
	Hradilová et al., 2017	An excellent study that combines transcriptomic data with metabolomic data and morphological data between domesticated and wild peas. The analysis of multi-omic data allowed them to get a better understanding behind seed dormancy and pod dehiscence in domesticated peas.
Epigenomics	Guerrero-Bosagna, 2012; Heard and Martienssen, 2014; Burggren, 2016	Contrasting views on the role of transgenerational epigenetic inheritance in evolution. The topic is still debated and should be viewed critically.
	Jensen, 2015	In-depth review on the rationale and advances of epigenetic studies to understand domestication. The manuscript is focused on animal behavior, but many of the ideas can also be applied to domesticated plants.
	Janowitz Koch et al., 2016	A landmark paper showing the importance of epigenetic marks on dog domestication and its association with behavioral traits. The study doesn’t just compare the methylation marks between wolves and dogs, but also assess the heritability of the methylation marks and proposes a formal test to detect selection on epialleles.
Genome-editing tools	Boettcher and McManus, 2015).	Review on novel genome-editing techniques and RNA interference. Useful to compare and choose the best tool to validate candidate loci.
	Shan et al., 2020	A general guide on how to develop a CRISPR/Cas9 system on a non-model plant species.
	Soyk et al., 2017	Landmark paper that uses genome-editing to validate two candidate genes related to fruit size and reduced fruit dropping in tomato. The authors also detect the emergence of undesirable traits in domesticated tomatoes due to an epistatic effect between both domesticated loci and introduce wild alleles to generate new tomato phenotypes with reduced degrees of the undesirable traits.
Perspectives	Piperno, 2017	Review centered on the potential application of an extended synthesis framework to understand domestication. Centered around the concepts of niche construction, transgenerational epigenetic inheritance and developmental plasticity.

List of key publications with other reviews that are focused on specific topics, as well as some notable examples of research articles using some of the methods described in this review with reliable results.

After sequencing and assembling the genome of at least one individual, it must be properly annotated before it can be of any use. Since eukaryotic genes are structurally complex, genome assemblies require the additional sequencing of RNA data from the same species to be used as transcriptomic evidence, alongside homology evidence from other curated genomes and ab initio predictions based on the underlying structure of genes, in order to be successfully annotated (Yandell and Ence, 2012; see Table 1). Even though whole-genome assembly projects were previously restricted to large research groups (e.g., Schnable et al., 2009; Tomato Genome Consortium, 2012), the sequencing cost per nucleotide is declining constantly in all the aforementioned technologies, making genome analyses accessible for a large part of the research community (Muir et al., 2016). The current bottleneck for small research groups is usually not the cost of sequencing itself, but rather the availability of computational resources capable of storing and analyzing huge amounts of data (Muir et al., 2016).

The main purpose of assembling a genome in a domestication study is to use it as a reference for high-quality population data to infer the selection, introgression and recombination processes, and to design posterior studies for experimental validation of candidate loci. Even though several population-level analyses based on reduced-representation genome sequencing can be performed in the absence of a reference genome (De Wit et al., 2012; Mastretta-Yanes et al., 2015), the use of a reference genome alongside population data enables the correct identification of otherwise anonymous loci into specific genes or regions within the genome and it makes possible the identification and the proper handling of linkage between loci (Fitz-Gibbon et al., 2017). Also, it can help to discriminate between orthologous and paralogous loci, which is critical given the large size of many genomes and the frequent genome duplication processes experienced during the evolution of plant and animal lineages (Clark and Donoghue, 2018; Zadesenets and Rubtsov, 2018).

Thus, the availability of a reference genome is desired for genomic analyses concerning domestication. Luckily, domesticated taxa are usually economically relevant, drawing the attention of several research groups worldwide and in some cases helping to fund the projects. Therefore, reference genomes are usually available for domesticated species, since such data is also relevant for other research areas, such as crop improvement and breeding programs (Ellegren, 2014). However, it should be noted that using a single reference genome can lead to reference bias, where sequenced individuals that are more distantly related to the reference will tend to have fewer predicted variants due to mismatches while mapping the reads (Günther and Nettelblad, 2019).

Besides its use as a reference genome for population-level data, the analysis of several whole-genome assemblies between domesticated and wild taxa will help us reveal structural differences between the genome of a domesticated taxon and its closest wild relatives, such as duplications, chromosome rearrangements or presence/absence of entire genes and genomic regions (Yang et al., 2012; Wang W. et al., 2014; Xie et al., 2019). Since selection and bottlenecks during domestication often leads to the fixation of mutations that involve a loss of function (Renaut and Rieseberg, 2015; Moyers et al., 2018), comparative analyses using genome assemblies of wild ancestors may also reveal these changes in genes that could not be properly predicted within the domesticated genome (Moyers et al., 2018). In this sense, further efforts should be made to assemble high-quality genomes of wild relatives alongside the domesticated taxon of interest (Brozynska et al., 2016; Xie et al., 2019).

Strategies to Gather Adequate Population Genomics Data

Genome assemblies alone give us a limited view on domestication, unless several genomes of wild relatives (if known and available) and domesticated individuals are sequenced, because evolution is a population-level process, and in consequence population data is necessary to address most of the evolutionary questions in domestication (Wang G.-D. et al., 2014; Guerra García and Piñero, 2017). Population genomics examines the genetic variation within and between populations that is scattered across the entire genome to assess the demographic history, phylogenetic relations and selective pressures of a species (Jorde, 2001). Several types of genomic data can be evaluated at the population level, including single nucleotide polymorphisms (SNPs), indels and copy number variations; but SNPs are the most commonly analyzed of the three (Seal et al., 2014).

All population-level sequencing techniques share common pitfalls that should be known and avoided before investing any money on sequencing. Population sampling should be planned carefully, as the sampling scheme has a stronger impact over sequencing to obtain reliable results in any analysis (Meirmans, 2015). Also, different populations should be mixed, rather than being sequenced on separate libraries or sequencing lanes, as failing to do so will generate sequencing biases that can be confused with biological patterns (Meirmans, 2015; see Table 1).

Once adequate genomic population data is gathered, we need to analyze the demographic processes that shaped the genetic variation and the population structure of contemporary populations during the domestication process. This data is necessary to perform tests to detect natural and artificial selection, which are required to understand the genetic base of domestication syndromes (Ross-Ibarra et al., 2007). There are several approaches to obtain population data at a genomic scale, which differ in the fraction of the genome that is sequenced, therefore determining the sequencing cost of each sample (Schreiber et al., 2018).

Whole-Genome Sequencing of Populations

After assembling a reference genome, one of the next possible strategies to understand domestication is to sequence the complete genome of several individuals. This approach requires the alignment of the sequencing reads back to a reference genome, in order to infer the variable sites between individuals and know the genetic elements (e.g., genes, upstream regulators, repetitive elements, non-coding RNAs) associated to those sites. The main benefit of this approach is its potential to retrieve all the variant sites within an individual’s genome that are structurally represented in the reference genome. Whole-genome sequencing can be used in almost any population-level test of interest (Schreiber et al., 2018). Common practices recommend a sequencing depth around 30× per individual, but empirical studies in pigs suggests that even 10x is enough to cover up to 99% of a genome with accurate detection of variant sites (Jiang L.G. et al., 2019). The main drawback of this approach is the sequencing cost of each sample, which is significantly higher compared with other approaches, especially for organisms with large genomes such as polyploid crops or mammals (Schreiber et al., 2018). This can lead researchers to evaluate a trade-off between sequencing depth and number of sampled individuals to optimize their resources. Simulation studies suggest that sequencing more individuals is more convenient to obtain reliable results, even at the expense of lower sequencing depths per individual (Fumagalli, 2013).

Alternatives to Whole-Genome Sequencing

Other approaches aim to reduce the sequencing cost per samples by pooling the DNA of several individuals into a single sequencing library (Futschik and Schlötterer, 2010) or by reducing the portion of the genome that is sequenced (often named as reduced-representation sequencing), either by sequencing arbitrary defined segments scattered across the genome, by targeting the desired portions of the genome or by sequencing the transcriptionally active portions of the genome (Schreiber et al., 2018). These techniques are especially helpful for organisms with very large genomes, and some of these methods can even be used in the absence of a reference genome (Mastretta-Yanes et al., 2015; Schreiber et al., 2018). Furthermore, the reduced representation of the genome means that those fewer regions that are targeted can have a high sequencing depth, leading to higher accuracy of the observed genetic variation and better heterozygosity estimations (Schreiber et al., 2018). Additionally, the reduced sequencing cost per sample allows for a large number of sequenced individuals and populations that, with a proper sampling strategy, can lead to robust results (De Mita et al., 2013; Lotterhos and Whitlock, 2015). Due to the fragmented nature of these sequencing techniques, reduced representation data alone may be insufficient to pinpoint all or even the most important possible causal genetic variants associated to the domestication syndromes (Lowry et al., 2017), but they are still useful to infer basic genetic statistics, infer demographic properties and past demographic scenarios, detect some signatures of selective sweeps across the genome and even perform GWAS for domestication traits of interest (Andrews et al., 2016; Schreiber et al., 2018).

Pool Sequencing

Pool sequencing (Pool-seq) is a promising alternative to whole-genome sequencing with a much lower cost (Futschik and Schlötterer, 2010). As the name suggests, Pool-seq consists of sequencing a large pool of individuals for a given population into a single high-throughput sequencing library, instead of sequencing each individual separately, allowing an accurate estimation of allele frequencies and other parameters of population genetics at the expense of losing individual-level information (Futschik and Schlötterer, 2010). This method requires to map the reads against a reference genome of the same species in order to work (Schlötterer et al., 2014). It is intended for sequencing large pools of individuals (>40 individuals per population is recommended, but >100 is optimal), otherwise the allele frequencies will not be estimated accurately (Schlötterer et al., 2014). The relative amount of pooled DNA of each individual in a Pool-seq study should be similar in order to avoid overrepresentation of individual alleles, a task that is often challenging (Schlötterer et al., 2014).

Pool-seq has several limitations that should be considered based on the objectives of the research project. It is difficult to discard a low-frequency allele from a sequencing error, but this problem is potentially fixed by either establishing a minor allele frequency threshold for SNP calling or by using pool replicates (Schlötterer et al., 2014). One important limitation is the inability of Pool-seq data to estimate linkage disequilibrium and haplotype phasing, which is particularly important to evaluate the non-independence of genetic signals in demographic studies and selective scans (Schlötterer et al., 2014). Finally, assessing genetic structure can be difficult and sometimes misleading when using Pool-seq, due to potential biases in individual allele representations within the pool (Dorant et al., 2019). This makes Pool-seq an adequate method for GWAS, selective sweeps and some methods based on allele frequencies when resources are limited (Luu et al., 2017; Inbar et al., 2020), but the loss of individual-level information makes many of the demographic inferences difficult, as populations need to be predefined before sequencing (Dorant et al., 2019), and the bioinformatic tools that handle Pool-seq data are scarce.

Exome Capture and Sequencing

Exome sequencing is another lower-cost alternative to whole-genome sequencing which targets the protein-coding regions of the genome (Warr et al., 2015; Kaur and Gaikwad, 2017). Protein-coding genes represent a small fraction of eukaryotic genomes, which is particularly useful for most population genomic studies, since it represents mostly functional elements within genomes (Kaur and Gaikwad, 2017). This technique is usually performed using hybridization probes, which requires previous knowledge of the genome content as well as a priori selection of regions of interest in order to design probes (Kaur and Gaikwad, 2017). Fortunately, hybridization probes are already available for several domesticated plants and animals (Warr et al., 2015; Kaur and Gaikwad, 2017).

Despite its advantages, exome sequencing can generate an uneven sequencing depth in certain genomic positions, unlike whole-genome sequencing that shows a uniform distribution of reads throughout the genome (Lelieveld et al., 2015). Another important limitation of exome sequencing is its bias towards the protein-coding portion of the genome, since increasing evidence shows that many of the genetic changes that have been directly associated to domestication traits are located within cis-regulatory elements, noncoding RNAs and other trans-regulatory elements, rather than within the open reading frame of the genes (Swinnen et al., 2016). Despite its limitations, demographic history and selective sweeps can still be detected using this sequencing method (Pankin et al., 2018).

RNA Sequencing of Populations

Transcriptome sequencing (also known as RNA-seq) is another useful approach to obtain population-level data from the transcriptionally active elements within genomes (De Wit et al., 2012). RNA-seq can be mapped against a reference genome to detect genetic variants and determine the genomic regions of interest, but it can also be analyzed in the absence of a reference genome (De Wit et al., 2012), since transcriptomes can be assembled de novo (Haas et al., 2013) and the functional annotation of the assembled transcripts is relatively easy (Bryant et al., 2017).

However, transcription profiles are dependent on the sequenced tissues and organs, the development stage of the organism, and the influence of external stimuli, capturing just the transcripts that are active at the moment of RNA extraction (Hekman et al., 2015). This complexity can generate important biases in the relative abundance of certain transcripts over others and overlook potential adaptative genes whose expression are context dependent (Hekman et al., 2015; Kaur and Gaikwad, 2017). Nonetheless, RNA-seq is still a good option for species with large genomes that are hard to assemble (De Wit et al., 2012). Similarly to exome-sequencing, RNA-seq data can be used to evaluate demographic history and selective sweeps, but the selective signals are restricted to the transcriptionally active part of the genome, and cannot be used to evaluate structural variants (Schreiber et al., 2018).

Restriction Site-Associated DNA Sequencing

Restriction site-associated DNA sequencing (RAD-seq), which may also be referred to as genotyping by sequencing (GBS), has been one of the most popular options for cost-affordable population genomics in the last decade (Davey and Blaxter, 2010). The technique consists in using restriction enzymes to digest the DNA and sequence the regions adjacent to the restriction sites that are scattered across the genome (Davey and Blaxter, 2010). It can also be combined with sequence capture techniques to target specific loci of interest (Ali et al., 2016). RADseq data can either be mapped against a reference genome or it can be assembled de novo (Catchen et al., 2013; Mastretta-Yanes et al., 2015), making it a versatile technique for species with scant genomic resources.

However, empirical studies show that using certain de novo approaches for RAD-seq data can lead to fewer predicted SNPs due to errors in the definition of loci and treatment of sequencing errors (Shafer et al., 2017), all which may subsequently alter downstream analyses, especially those based on the distribution of allele frequencies within the genome of a population, also known as the site frequency spectrum (SFS) (Shafer et al., 2017). For this reason, a reference-based approach is highly recommended as long as the reference genome is closely related to the population dataset (Shafer et al., 2017). Furthermore, RADseq data could involve errors when a polymorphism resides within a restriction site, which prevents the enzyme to cut in individuals carrying such polymorphism, leading to failures in sequencing that region in homozygous individuals (null alleles) and makes heterozygous individuals to look like homozygotes (allele dropout) (Andrews et al., 2016). Finally, the capacity of RADseq libraries to adequately perform selective scans has been casted into serious question (Lowry et al., 2017). Its potential capacity to detect selective sweeps is dependent on the genome size, the density of variants detected for a given genomic region and specially the length of the extent of linkage disequilibrium in the genome (Lowry et al., 2017). Thus, when a species genome has short regions in linkage disequilibrium (due to high recombination rates) and the SNP density is low (particularly in large genomes), odds are that the selective scans will likely miss a significant portion of selective sweeps associated to domestication (Lowry et al., 2017).

Pan-Genome Analyses in Domesticated and Wild Taxa

An increasing number of studies are revealing that structural variants (copy-number variation, presence/absence of genomic regions, inversions, transversions, translocations) are common within plant and animal populations (Khan et al., 2020). Thus, the use of a single reference genome hampers our ability to study the full repertoire of genetic variation within a species (Golicz et al., 2016a; Zhao et al., 2018). Structural variants such copy-number variation can contain functional genomic elements that are usually under relaxed selective pressures and can serve as the basis of adaptation given specific environments and selective regimes (Lye and Purugganan, 2019). Coincidentally, copy-number variation and other structural variants play an important role in the emergence of domestication traits, as well as diversification traits in landrace varieties (Lye and Purugganan, 2019). Some studies estimate that at least one third of the known domestication loci are structural variants, and up to one in seven genes can be hemizygous (i.e., with one copy) in grapevine individuals (Zhou Y. et al., 2019). Despite its importance, structural variants cannot be properly analyzed using any of the aforementioned techniques. This led the research community to adopt the concept of the pan-genome, an idea that first appeared in microbiology (Tettelin et al., 2005), into the study of plant and animal genomes (Golicz et al., 2016a).

The concept of pan-genome rests on the idea that the genomes of individuals within a population or species share a core set of genes that unifies them (i.e., the core genome), but also contain a fraction of genes that are absent from one or more individuals (i.e., the accessory or dispensable genome), which altogether give rise to the pan-genome of such population or species (Tettelin et al., 2005).

There are three main methods to generate a pan-genome: the alignment and comparison of multiple de novo genome assemblies, the iterative assembly of several genomes from an initial reference or the use of de Bruijn graph assemblers to jointly assemble several genomes (Golicz et al., 2016a; see Table 1). Since domestication reduces the genetic diversity of a taxon, often eliminating portions of the dispensable genome that contain genes involved in local adaptation, the use of wild relatives is crucial to generate a representative pan-genome for a species (Khan et al., 2020). Once a pan-genome is generated, it can be used alongside whole-genome sequencing data to analyze the structural variants between and within populations, revealing novel loci involved in the development of domestication-related traits that would have stayed hidden when using a single reference genome (Li et al., 2014; Zhao et al., 2018). Besides, the use of a pan-genome alleviates the inherent reference biases of a single reference genome (Günther and Nettelblad, 2019).

Pan-genome studies have revealed additional selective sweeps and structural variants associated to the domestication process, which were not identified using sequencing data with a single reference genome (Li et al., 2014; Zhao et al., 2018). Pan-genomes are already available for several species (Figure 2) such as maize (Brohammer et al., 2018), wheat (Montenegro et al., 2017), Brassica oleracea (Golicz et al., 2016b) or Brassica napus (Hurgobin et al., 2018); and pan-genome analyses to study domestication have already been performed in soybean (Li et al., 2014), rice (Zhao et al., 2018), sunflower (Hübner et al., 2019) and tomato (Gao et al., 2019). While current eukaryote pan-genome analyses are focused on plant species (Golicz et al., 2016a, see Table 1) and goats (Li et al., 2019), other livestock researchers may soon venture into this field. As sequencing technologies become cheaper, multiple pan-genomes from different species of the same genus should eventually be combined to create a super-pan-genome that represents the entire genetic content available in a genus with one or more domesticated taxa, as it would include the diversity of all their wild relatives (Khan et al., 2020).

FIGURE 2

Population Genetics and Demographic Analyses of the Domestication Process

Demography and population size changes during the domestication process is tightly related to unraveling some of the most fundamental questions of the domestication process. These analyses can help answer questions such as possible centers of origin and diversification, patterns of migration and expansion throughout these centers, gene flow between domesticated and wild taxa, number of domestication events, the extent of genetic erosion in the domesticated taxon, levels of global genetic differentiation between wild and domesticated taxa, the patterns of adaptive and neutral introgression among them, and in some cases even the number of generations that have elapsed since domestication and other processes such as differentiation and local adaptation of domesticated taxa (Meyer and Purugganan, 2013; Guerra García and Piñero, 2017).

Genetic Diversity in Populations

A first necessary step for the SNP data is to extract and compare the summary statistics of population genetics within and between populations (Andrews et al., 2016). This information describes the genetic diversity in populations, including the estimate of allele frequencies (usually denoted as p or the frequency of the most abundant allele), observed heterozygosity (H_O), expected heterozygosity (H_E), nucleotide diversity (π), number of segregating sites (S) and number of private alleles (i.e., alleles only found in one population). These summary statistics can reveal the level of genetic erosion in domesticated plants and animals when compared to the ancestral wild population, which is expected due to severe bottlenecks, selective sweeps and inbreeding (Groeneveld et al., 2010; Gepts, 2014). One should be aware that reference bias can influence the relative genetic variation observed between the wild and domesticated populations, which could be alleviated using more than one reference or using a pan-genome (Günther and Nettelblad, 2019).

Population Structure

It is also important to describe the population structure (i.e., the genetic differentiation among populations) of domesticated taxa and of their wild relatives, as it can reveal the influence of historical events that shaped the genetic diversity of the organisms (Linck and Battey, 2019). The level of population structure between wild and domesticated taxa can be determined by several factors, such as the number of generations since domestication started, the intensity of the selective pressures imposed to the domesticated taxon, the intensity of the bottlenecks suffered though the domestication process, and the frequency of gene flow between the domesticated taxon and its wild relative (Meyer and Purugganan, 2013).

The F-statistics are classic estimates of population genetics that are based on the heterozygosity values within and among populations, which can reveal patterns of inbreeding, gene flow and differentiation between and within populations (Andrews et al., 2016). Of these, the F_ST statistic is of particular interest, since it can be used to detect population structure between wild and domesticated populations, or between different domesticated varieties (Andrews et al., 2016). These estimates are relatively simple to calculate, but they require a priori assignment of individuals to discrete populations, which may be wrongly assigned, may not reflect natural populations or may simply be unknown (Linck and Battey, 2019).

Methods based on population clustering have become popular for describing genetic structure, as they do not require a priori population assignment. These clustering methods can be classified into parametric and non-parametric methods (Linck and Battey, 2019). Parametric methods, also known as model-based methods, assign individuals into a predefined number of K populations based on their genotypes and the allele frequency of each locus (Pritchard et al., 2000). Several parametric methods have been described that successfully analyze genomic datasets to infer population structure (e.g., Tang et al., 2006; Alexander et al., 2009; Raj et al., 2014), but one has to be careful when using them, as they assume linkage equilibrium and Hardy-Weinberg equilibrium in the dataset (Linck and Battey, 2019), so SNPs should be filtered accordingly before these methods can be confidently used (Wigginton et al., 2005; Mathew et al., 2018). Furthermore, parametric methods have been found to be susceptible to changes in the SFS generated by minor allele frequency thresholds that are commonly used to filter population genomics data because low-frequency polymorphisms are expected to contain information about recent events, which adds uncertainty to the assignation of individuals in populations that reflect ancient demographic events (Linck and Battey, 2019).

Non-parametric methods include principal component analyses, discriminant analyses of principal components and K-means clustering. These methods define populations and genetic structure by transforming the genetic data into uncorrelated variables – named eigenvectors or principal components – to identify groups within the dataset (Patterson et al., 2006; Jombart et al., 2010; Linck and Battey, 2019). Non-parametric methods were designed to work with large amounts of genomic data (Patterson et al., 2006; Jombart et al., 2010) and they are more robust to changes in the SFS than the parametric methods, so it is recommended to run both types of methods and compare their results before making further inferences (Linck and Battey, 2019).

Inferences in Changes of Population Sizes Throughout Time

An important aspect of the demographic history of domesticated taxa is the analysis of the change in the effective population size (N_e) in the populations throughout time (Chen J. et al., 2018). The concept of N_e reflects the estimated populations size in a Wright-Fisher model given an observed genetic variation, so these estimations hardly reflect the census population size of real populations (Charlesworth, 2009), and can also be affected by reference biases and allele dropouts. Changes in N_e can reveal or at least hint on the demographic history of taxa throughout the domestication process, such as expansions or bottlenecks. These changes can help to understand other evolutionary aspects of domestication concerning natural and artificial selection, such as the efficiency of selection and the accumulation of deleterious mutations in domesticated taxa (Chen J. et al., 2018; Allaby et al., 2019).

The domestication process is expected to include a bottleneck as a consequence of subsampling the genetic diversity in the wild ancestor, followed by a population expansion as domesticated taxa diversify (Meyer and Purugganan, 2013), although this idea has been recently challenged by paleogenomic studies (Allaby et al., 2019). Many methods exist to explore the changes in N_e throughout time, whose approach sometimes depends on the type of data available. It should be noted that all the methods to infer historical changes in N_e are susceptible to predicting false bottlenecks when populations are structured, so as indicated above, genetic structure should be evaluated and properly accounted for (Nielsen and Beaumont, 2009).

Studies with few individuals and high sequencing depth may use the Pairwise Sequentially Markovian Coalescent model (PSMC; Li and Durbin, 2011) or the Multiple Sequential Markovian Coalescent model (MSMC; Schiffels and Durbin, 2014) to analyze the demographic history of domesticated and wild taxa. The PSMC and MSMC models can infer changes in N_e throughout time (bottlenecks and expansions) by calculating the distribution of the time of coalescence between all the heterozygous loci in complete diploid genomes (Li and Durbin, 2011; Schiffels and Durbin, 2014). These models can also calculate the time of coalescence (i.e., separation, and in some cases the domestication time) between two genomes given a specified mutation rate, recombination rate and generation time (Li and Durbin, 2011).

However, the genomes used in PSMC or MSMC must be of very good quality, having an average sequencing depth of the very least 18x, at least 10 reads per site, and less than 25% of missing data (Nadachowska-Brzyska et al., 2016). Besides, PSMC has several limitations compared to other estimators of N_e and is particularly susceptible to predicting false bottlenecks when populations are structured (Mazet et al., 2015). Nevertheless, this can be properly handled by comparing models of instantaneous N_e size change against models of classical symmetric islands using a maximum-likelihood approach (Mazet et al., 2015).

Multiple Sequential Markovian Coalescent can infer more recent changes in N_e compared to PSMC (Schiffels and Durbin, 2014), so it may be convenient to explore recent demographic expansions in diversified domesticated taxa (Allaby et al., 2019). For example, MSMC was used to infer population bottlenecks in East Asian and Western Eurasian dogs, as well as divergence times between wolves and dogs around 60,000–20,000 years ago (Frantz et al., 2016), while PSMC was used to determine a severe bottleneck in African rice around 15,000–13,000 years ago (Meyer et al., 2016).

Other methods rely on population data at a genomic scale from many (sometimes hundreds) individuals (as obtained from exome sequencing or RAD-seq), namely the extended Bayesian skyline plots (Heled and Drummond, 2008; Trucchi et al., 2014) and the stairway plots (Liu and Fu, 2015). Since N_e is a crucial concept in coalescent theory, extended Bayesian skyline plots and stairway plots rely on the SFS calculated from the population data to estimate N_e (Heled and Drummond, 2008; Liu and Fu, 2015). The inferences made from these two methods are comparable to those obtained from PSMC and MSMC, although they rely on different kinds of datasets (Liu and Fu, 2015). Furthermore, stairway plots are more efficient in inferring recent demographic history, whereas PSMC is more reliable for ancient demographic events (Liu and Fu, 2015).

Estimating Gene Flow and Introgression Between Populations

Ancient gene flow and local ancestry (i.e., the genetic ancestry of an individual for an specific chromosomal position; Thornton and Bermejo, 2014) are also important aspects of plant and animal domestication that need to be addressed, since they can describe the genetic contribution of different ancestral populations in the genomic architecture of extant populations, such as wild and domesticated taxa (Price et al., 2009; Pickrell and Pritchard, 2012).

One approach to assess ancient gene flow are graph-based methods that incorporate the possibility of ancient gene flow between distantly related populations (Pickrell and Pritchard, 2012). This type of methods represents the relationships between populations as a bifurcating tree, where internal nodes can also be interconnected forming a graph that represents ancient gene flow that contributed to modern genetic variation (Pickrell and Pritchard, 2012). For example, graph-based analyses have revealed constant gene flow between sympatric populations of domesticated and wild pearl millet (Burgarella et al., 2018), constant gene flow between domesticated and wild pigs (Frantz et al., 2015) but lack of hybridization events between wild and domesticated populations of goats and sheep (Alberto et al., 2018).

Another popular test to infer ancient admixture is the ABBA-BABA test, also known as the D-statistic, which evaluates the allelic patterns of three taxa and compares them to an outgroup to identify genomic regions with an excess of shared derived variants that are not concordant to the species tree (i.e., ABBA-BABA patterns), which suggest introgression events (Durand et al., 2011). The f test, which is derived from the D-statistic, can help discriminate between introgression events and nonrandom mating in ancestral structured populations (Martin et al., 2015). The D-statistic is sensitive to both introgression and incomplete lineage sorting, so both signals can be separated by testing deviations in the symmetry of branch lengths between the gene trees and the species tree (Edelman et al., 2019). By the same logic, the D₃ test can also infer introgression events by analyzing the symmetry in branch lengths, without the need for an outgroup (Hahn and Hibbins, 2019). The D-statistic has been used to infer several introgression events between species of the Bos genus during domestication (Wu et al., 2018).

On the other hand, local ancestry methods can reveal which chromosomal segments in the genome were inherited from different ancestral source populations (Price et al., 2009). These methods use the data obtained from linkage disequilibrium between loci to assign ancestry in each portion of the genome in comparison to reference populations that depict ancestral source populations, requiring an a priori assignation of unadmixed reference populations in order to assign local ancestry to the populations of interest (Price et al., 2009). The analysis reveals chromosomic blocks that can be assigned to either a wild or a domesticated ancestry in hybrid populations, which may reveal historical processes of introgression and local adaptation in modern domesticated populations, as well as potential targets for selective breeding (Janzen et al., 2019).

Many methods exist that can infer local ancestry using genome-wide population data, and all of them require a high-quality reference genome (preferably assembled at a chromosome-level) in order to detect the ancestry of chromosomal segments (e.g., Price et al., 2009; Baran et al., 2012; Maples et al., 2013; Dias-Alves et al., 2018). For example, a local ancestry analysis of East Asian domestic cattle revealed introgressed blocks inherited from ancient banteng and yak populations that contained genes enriched in sensory perception of smell, transmembrane transport and antigen processing (Chen N. et al., 2018).

Using Demographic Simulations to Infer Domestication Scenarios

The previous descriptive tools can help us explore possible evolutionary and demographic scenarios in the absence of a priori hypotheses (Liu and Fu, 2015). However, for domesticated taxa we usually have additional classic botanical, zoological, morphological, paleoclimatic, archeological, ethnobiological and biogeographical data that may suggest some likely scenarios (Gerbault et al., 2014). Thus, demographic modeling can be used to test explicit demographic scenarios by comparing simulations of SFS in such scenarios to the observed data (Gerbault et al., 2014; Liu and Fu, 2015). There are many methods available for demographic modeling, which can be more suitable depending on the type of scenarios that need to be tested (Anderson et al., 2005; Gutenkunst et al., 2009; Excoffier and Foll, 2011; Cornuet et al., 2014). All these methods rely on some basic tenets of coalescent theory (Liu and Fu, 2015), so they are also susceptible to possible biases in the observed genetic variation in the populations.

For example, the approximate Bayesian computation (ABC) method compares the summary statistics of several simulated scenarios against the observed data to accept or reject certain demographic hypotheses (Cornuet et al., 2014; Gerbault et al., 2014). This method can help us determine certain parameters of our models and can be used with genome-wide datasets (Cornuet et al., 2014).

Other methods based on diffusion approximation can help us infer the demographic history of multiple populations and their interaction through migration and admixture using biallelic SNP data (Gutenkunst et al., 2009). Demographic modeling has helped test the number of domestication events as well as intercontinental migratory events in cattle (Pitt et al., 2019). Coalescent simulations have supported a common origin for all the domesticated varieties of pearl millet (Burgarella et al., 2018), while the ABC method has revealed that the most likely scenario in the domestication of the scarlet runner bean consists of a single domestication event around 21,000 years ago with a mild bottleneck effect (Guerra-García et al., 2017).

Identifying Genes Under Selection During Domestication

Demographic processes are important to understand the general history that led to the domestication of plant and animal taxa, but many studies are specially interested in finding the selected genes that explain the phenotypic differences between domesticated taxa and their wild counterparts (Wang G.-D. et al., 2014; Kantar et al., 2017). Indeed, the detection of these genes under selection during domestication is critical to understand the genetic basis of domestication syndromes, especially for detecting genetic variation relevant for future improvement and selective breeding (Hufford et al., 2012).

When a genetic variant increases its frequency due to positive selection (i.e., selection favoring the fixation of a new allele), the adjacent alleles (i.e., physically connected in the same chromosomal region) also increase their frequency in a process known as hitchhiking (Smith and Haigh, 2007). Once the genetic variant under selection reaches a high frequency or fixation, the hitchhiking effect reduces or even eliminates the genetic variation around the selected locus, producing what is known as a selective sweep (Vitti et al., 2013; Pavlidis and Alachiotis, 2017). The size and intensity of a selective sweep depends on the rate of recombination in the genome, and on the intensity of the selective pressure (Smith and Haigh, 2007), which may be weaker in conscious selection compared to some cases of natural selection (Fugère and Hendry, 2018; Yang et al., 2019). Luckily, the signals of a selective sweep can be detected when the selection event occurred “recently” in an evolutionary timescale, as it is the case for domestication (Vitti et al., 2013).

Different bottom-up methods using population genomics data have been developed to detect the regions in the genome that were selected for during domestication, which we will refer to as candidate loci. We can mention methods for detecting regions with higher population differentiation compared to the rest of the genome, methods for detecting local changes in the SFS throughout the genome, and methods that detect extended regions with strong linkage disequilibrium compared to other haplotypes in the genome (see Supplementary Table S1 for a summary of methods to detect selective sweeps). Alternatively, a GWAS can be performed to detect the association of a genetic variant to a specific phenotype of interest (Wang G.-D. et al., 2014).

F_ST Outlier Tests to Detect Candidate Genes

Besides the standard use of F_ST to detect global population structure, the F_ST statistic can also be used to detect signals of selective sweeps between populations, namely between wild and domesticated taxa (Gepts, 2014). While a global F_ST statistic (involving all the analyzed loci or SNPs) can reveal the overall genetic structure between populations, a local F_ST statistic calculated for each locus or SNP along the genome can evaluate whether particular regions of the genome are more differentiated from what is expected due to demographic processes, which can be interpreted as signals of a selective sweep (Nei and Maruyama, 1975). Many different methods exist that are based on the F_ST statistic, which are collectively known as F_ST outlier tests (Foll and Gaggiotti, 2008; Excoffier et al., 2009; Bonhomme et al., 2010; de Villemereuil and Gaggiotti, 2015; Lotterhos and Whitlock, 2015), that differ mainly on the underlying model used to calculate the null distribution of the F_ST values, and thus its ability to detect outliers (Supplementary Table S1).

F_ST outlier tests are able to detect selective pressures following a bottom-up approach, but their efficiency is determined by a multitude of factors that should be carefully accounted for before using them, such as the sampling scheme used to obtain the population data, the total size of the dataset (i.e., number of populations, of individual per population and of SNPs analyzed), the intensity of the selective pressure, the selfing or allogamous nature of its sexual reproduction, and the migration patterns and genetic structure among populations (De Mita et al., 2013; Lotterhos and Whitlock, 2014, 2015).

Some successful examples in the use of F_ST outlier tests include the detection of domestication candidate genes in apple involved in fruit development, size, acidity and sugar metabolism (Khan et al., 2014), the finding of candidate domestication genes involved in metabolism and oil biosynthesis in sunflower (Baute et al., 2015), the description of candidate diversification genes between pig breeds associated to the shape of the skull (Wilkinson et al., 2013), and the identification of candidate loci between wild and domesticated salmon strains involved in body weight, condition factor, male maturation and a brain related protein (Vasemägi et al., 2012).

Site Frequency Spectrum Based Tests to Detect Selective Sweeps

Selective sweeps alter the SFS that would be expected under neutral evolution processes because of the reduction in the genetic diversity around the loci under selection (Vitti et al., 2013). The genomic region under selection skews the SFS into an excess of high frequency derived alleles when the selective sweep was recent, since the alleles that were linked to the favored selected locus also reach high frequencies (Fay and Wu, 2000). However, after all the high-frequency alleles reached fixation, the genomic region under the selective sweep will have little to no variation, while mutations will slowly generate new allelic variants, skewing the SFS into an excess of low frequency variants (Zeng et al., 2006). Several tests have been developed to detect skews in the SFS, each of them capable of detecting changes in different parts of the SFS (Supplementary Table S1), making them complementary to one another (Zeng et al., 2006; Vitti et al., 2013).

Even though SFS based tests are powerful tools to detect selection, it is important to remember that the SFS at the global genomic scale is also altered by demographic events such as bottlenecks that produces an excess of low frequency variants, and expansions that generates an excess of intermediate frequency variants (Vitti et al., 2013). Thus, it is mandatory to have a previous prediction of the demographic history of the populations in order to properly adjust the null hypothesis in each test (Ross-Ibarra et al., 2007).

The well-known summary statistic called Tajima’s D is sensitive to changes in low-frequency variants, making it particularly useful to detect selective sweeps before and after the selected locus reaches fixation, although low-frequency variants can also be observed in loci under purifying selection (Tajima, 1989; Zeng et al., 2006). Tajima’s D is also sensitive to intermediate-frequency alleles, making it useful to detect balancing selection (Tajima, 1989) or even some forms of soft selective sweeps generated by standing genetic variation (Prezeworski et al., 2005).

Conversely, Fay and Wu’s H is sensitive to changes in high-frequency variants, which are only altered by positive selection, making it very useful when used alongside Tajima’s D (Fay and Wu, 2000). Unlike Tajima’s D, Fay and Wu’s H needs an outgroup species in order to differentiate ancestral alleles from derived alleles and thereby to know whether the derived alleles are at high or low frequencies (Fay and Wu, 2000).

Zeng et al. (2006)’s E is sensitive to both low and high frequency variants, making it particularly powerful to detect selective sweeps before or after the selected locus reached fixation, also needing an outgroup in order to differentiate derived alleles from ancestral alleles).

There are some tools available to implement SFS based tests using genome-wide data, that can perform all the above tests (i.e., Korneliussen et al., 2013, 2014; Rozas et al., 2017). For example, Tajima’s D test was used alongside other methods to detect selective sweeps associated to the domestication of yaks (Qiu et al., 2015), Zeng’s E test helped discover 125 selective sweeps associated to the domestication of horses (Librado et al., 2016), and the complementary implementation of Tajima’s D, Fay and Wu’s H and Zeng’s E revealed several candidate genes that share similar functions between peach and almond (Velasco et al., 2016).

The reduction of diversity (ROD) test is another popular SFS-based method to detect selective sweeps that has been particularly useful for the study of domestication (Supplementary Table S1). ROD compares local π values of domesticated taxa against the local π values of its wild relatives, using sliding windows alongside the genome (Guo et al., 2012; Huang et al., 2012; Qi et al., 2013; Schmutz et al., 2014). The ROD method has been used to successfully detect candidate domestication genes in rice (Huang et al., 2012), watermelon (Guo et al., 2012), cucumber (Qi et al., 2013), common bean (Schmutz et al., 2014), and chickpea (Varshney et al., 2019), to name a few.

Linkage Disequilibrium (LD) Based Methods to Detect Selection

Given that selective sweeps remove the variation in regions adjacent to the locus under selection, they can form haplotype blocks that extend in strong LD compared to other haplotypes in the same locus because they reached a medium-to-high frequency in the population swift enough so they are not yet disrupted by recombination (Sabeti et al., 2002; Vitti et al., 2013). This pattern has been exploited to develop several methods based on LD to detect selective sweeps of recent origin (Vitti et al., 2013). Interestingly, LD-based methods are sensitive enough to detect both strong and soft selective sweeps (Garud et al., 2015), as well as partial or incomplete selective sweeps (Vitti et al., 2013), making them excellent tools to study recent and ongoing selection events, such as those occurring during domestication and the subsequent diversification of landraces (Supplementary Table S1).

Since the above rationale relies on LD decay due to recombination, any method based on LD requires to control for local variation in recombination rates in order to reduce false positives (Sabeti et al., 2002). The extended haplotype homozygosity (EHH) is a widely used statistic in LD-based methods that is defined as the probability that two orthologous genomic regions carrying a “core” haplotype of interest (i.e., the part of the haplotype that is shared by all the individuals carrying it, such as the allele under positive selection) in the population are identical by descent (i.e., they were inherited by the same ancestor), as one looks to a specified distance farther away from the core region (Sabeti et al., 2002).

Among the LD based methods that uses the EHH, we can mention the long-range haplotype (LRH) test, sometimes named the relative EHH (rEHH) test, which controls for local recombination rates by comparing the EHH of several haplotypes localized within the same locus (Sabeti et al., 2002). Other EHH based methods include the whole-genome long-range haplotype (WGLRH) test that uses sliding windows to perform the LRH test (Zhang et al., 2006), the long-range haplotype similarity (LRHs) test (Hanchard et al., 2006), the integrated haplotype score (iHS) which is particularly sensitive to incomplete selective sweeps and soft sweeps (Voight et al., 2006) and the cross-population extended haplotype homozygosity (XP-EHH) statistic that is able to detect selective sweeps after the selected allele reached fixation (Sabeti et al., 2007). The iHS and the XP-EHH statistics can be regarded as complementary to each other, enabling the detection of incomplete and complete selective sweeps in the target population (Vitti et al., 2013).

All the LD-based tests that make use of the EHH statistic require the previous phasing of the chromosomes in order to work (i.e., assignation of alleles in an individual to their corresponding maternal and paternal haplotypes), which may or may not be possible depending on the sequencing depth and type of data available for the analysis (Delaneau et al., 2013). For instance, a reference genome is usually needed in order to phase genotypes, since most methods rely on the information of proximity between alleles and their distribution within individuals in a population to assign haplotypes (Delaneau et al., 2013) although new methods are emerging that can phase genotypes without a reference genome (Money et al., 2017).

There are other LD-based methods that do not make use of the EHH statistic, such as the LD decay (LDD) test, which rely on individuals that are homozygous for any given SNPs to look for LD differences between alleles in a population (Wang et al., 2006) or the ω statistic that scans for high SNP correlation coefficients around a site under selection (Kim and Nielsen, 2004; Alachiotis et al., 2012). Another method that do not require chromosome phasing is the regression-based test, which relies on the reduction of heterozygosity as one approaches the locus under selection in a genome to infer selective sweeps (Wiener and Pong-Wong, 2011). Other LD-based methods exploit the estimation of identity-by-descent using genome-wide data to detect haplotypes that are shared between several unrelated individual (> 10 generations) to infer selective sweeps without previous knowledge of the pedigree of individual (Han and Abney, 2013), so they might prove useful to study recent domestication processes.

Some examples of LD-based methods used to explore the domestication process includes an analysis using LRH to detect signatures of selection associated to dairy and beef cattle breeds (Bomba et al., 2015), a study using the XP-EHH statistic to find signals of selective sweeps in Jinhua pigs (Li et al., 2016), and a paper focused on the diversification of goat landraces that calculated the iHS and the XP-EHH statistics alongside other tests to detect selective sweeps between goat breeds (Bertolini et al., 2018).

Other important tests include the XP-CLR test (Chen et al., 2010) and the μ statistic (Alachiotis and Pavlidis, 2018) which implement multiple signatures to detect selective sweeps (Supplementary Table S1) and have been used to detect candidate loci in maize and African rice, respectively (Hufford et al., 2012; Ndjiondjop et al., 2019).

Using GWAS to Detect Domestication-Associated Loci

Genome-wide association studies have been used extensively to uncover the genetic variants that underlie domestication traits (Shi and Lai, 2015). The domestication traits that can be analyzed through a GWAS can encompass any biological characteristic from simple morphological traits (Jiao et al., 2012) to the production of certain metabolites (Shang et al., 2014), tame behavior in animals (Ilska et al., 2017), resistance or susceptibility to certain diseases (Wang et al., 2012), or adaptation to certain environmental conditions (Song et al., 2018).

An important advantage of the GWAS over the bottom-up approaches is its ability to detect polygenic effects on single traits of interest, which is commonplace considering that genes interact between them and the environment to generate phenotypes (Gibson, 2018).

A prerequisite before preforming a GWAS is to have large sample sizes in both the number of sequenced genetic variants and the number of individuals included in the study, as they are necessary to obtain the statistical power to detect variants with small effects and to reduce the risk of false positives (Wang G.-D. et al., 2014).

Some recent examples include the use of a GWAS to identify candidate genes with unknown functions involved in several agronomic traits, including drought and heat tolerance in chickpea (Varshney et al., 2019); a GWAS that revealed loci associated to fruit size and quality in peach (Cao et al., 2019); and a GWAS that uncovered the genetic variants involved in the absence of anthocyanin in domesticated rice compared to its wild relative (Zheng et al., 2019).

Ancient DNA and Paleogenomics of Domesticated Taxa

Extant domesticated taxa lack the information of ancient genetic diversity that was lost through bottlenecks, selection and genetic drift (Ramos-Madrigal et al., 2016). However, the analysis of ancient DNA can allow the research community to overcome some of these limitations (Irving-Pease et al., 2019). Ancient DNA retrieved from archeological sites allows the study of the rate at which domestication happened, as well as revealing which genes were important at the beginning of this process (Vallebueno-Estrada et al., 2016; Irving-Pease et al., 2019). Thus, paleogenomics is becoming a novel research area for understanding the process of plant and animal domestication (Irving-Pease et al., 2019).

Extraction and Sequencing of Ancient DNA

An important limitation of paleogenomic analyses is the level of preservation of the ancient DNA itself, as well as the total yield of extracted DNA (Sawyer et al., 2012). The DNA molecules that are extracted from tissues that are not conserved on permafrost and are older than 100 years are usually shorter than 100 bp (Sawyer et al., 2012). The strand breaks of such fragments are also non-random, as purines are enriched before the strand breaks (Sawyer et al., 2012). Additionally, these fragments incorporate cytosine-to-uracil mutations on their ends, further hindering the analysis of the sequenced fragments (Sawyer et al., 2012). Even though these characteristics hamper the sequencing and analysis of ancient DNA, they are also useful to differentiate between real ancient DNA and extant DNA contamination (Sawyer et al., 2012). Furthermore, due to the scarce ancient material located throughout few archeological sites worldwide, sample sizes in paleogenomic studies are very small, usually one or few individuals per location and sometimes only one locality (e.g., Wales et al., 2016; Ramos-Madrigal et al., 2016).

Given the above difficulties and the uniqueness of the biological material retrieved from archeological sites, it is crucial to extract and sequence as much ancient DNA as possible while avoiding DNA contamination (Gamba et al., 2016). Major efforts have been made to develop efficient protocols for ancient DNA extraction (Gamba et al., 2016) and single-strand library preparation for high-throughput sequencing (e.g., Gansauge et al., 2017). Organelle genomes were usually the target for ancient DNA sequencing because multiple copies of these can be found within each plant and animal cell and can reveal several demographic processes (Wales et al., 2016; Irving-Pease et al., 2019). Nonetheless, more evolutionary information can be retrieved from nuclear DNA, which is the main target for modern paleogenomic studies (Wales et al., 2016; Irving-Pease et al., 2019).

Insights of Paleogenomic Data in Domestication

Paleogenomic studies are challenging some of our previous ideas of the domestication process, such as the occurrence of ancient domestication bottlenecks, which appear to be absent in several archeological plant genomes, suggesting that the reduced diversity in domesticated taxa may be a more gradual process from what was expected using DNA of extant populations (Allaby et al., 2019). For example, several archeological samples of Sorghum bicolor from different time periods (ranging from 1800 to 100 years ago) were compared to extant individuals of the species, revealing that this crop did not suffered an initial domestication bottleneck, but rather that the reduction in genetic diversity, and its associated mutational load, occurred gradually throughout time (Smith et al., 2019).

Paleogenomics is also revealing important aspects of plant and animal domestication, such as the first genetic steps towards domestication syndromes as well as the overall graduality of the process (Ramos-Madrigal et al., 2016; Vallebueno-Estrada et al., 2016; Daly et al., 2018). For example, archeological remains of goat populations have revealed multiple domestication processes in ancient wild goats, possible dispersal routes of ancient goat populations and signs of early selective pressures towards candidate genes involved in pigmentation, milk production, size, reproduction and changes in diet (Daly et al., 2018). Likewise, several archeological maize samples retrieved from the Tehuacán Valley in Mexico have revealed that early domesticates already presented signals of selective sweeps on important candidate genes, such as teosinte branched1 and brittle endosperm2, but lacked selective sweep signals in other important candidate genes present on modern maize populations, even though these ancient maize populations were already endogamous and more closely related to modern maize than to wild teosinte, revealing that maize domestication was a gradual process ranging thousands of years (Ramos-Madrigal et al., 2016; Vallebueno-Estrada et al., 2016).

Other examples demonstrate the importance of paleogenomic studies in domesticated taxa, including grapevine (Wales et al., 2016), barley (Mascher et al., 2016), sunflower (Wales et al., 2019), horses (Schubert et al., 2014), dogs (Frantz et al., 2016) and cats (Ottoni et al., 2017).

RNA Sequencing to Detect Differentially Expressed Genes Associated to Domestication

Besides the use of RNA-seq to obtain population-level data, comparative transcriptomics is a good way to find or support the validity of candidate genes (Hekman et al., 2015). Transcriptomic analyses between domesticated and wild taxa can reveal important changes in gene expression associated to domestication (Koenig et al., 2013; Hekman et al., 2015; Hradilová et al., 2017). Likewise, the analysis of hybrids between domesticated and wild individuals can reveal important patterns of allele-specific regulation and the role of cis/trans regulatory elements in the emergence of domestication traits (Bell et al., 2013; Lemmon et al., 2014).

The Experimental Design of Differential Expression Analyses

Transcriptomic profiles are tissue-specific and time-dependent (Hekman et al., 2015). Thus, a good experimental design can reveal important loci involved in the phenotypic differences associated to domestication syndromes, such as suppression of secondary metabolites, changes in form, size, taste, absence of defense mechanisms, seed dormancy, docile behavior, among other traits (Hekman et al., 2015). This can be done by comparing the total RNA expression of the tissue or organ of interest (Koenig et al., 2013), as well as comparing RNA expression throughout the developmental stages of such tissue or organ (Hradilová et al., 2017).

Since transcriptomic analyses are experimental by nature, experimental designs require biological replicates for each treatment, condition or organ to assess the variability in the data; as well as controlled environmental conditions to reduce possible biases and sources of error (Fang and Cui, 2011; Schurch et al., 2016). Empirical studies recommend using at least six biological replicates for each condition in the experiment, even though the use of three replicates is common, but discouraged (Burden et al., 2014; Schurch et al., 2016). Additionally, it is important to avoid committing errors in the experimental design that can bias the results of the RNA-seq experiment, such as using different sequencing technologies for each sample, using different methods for library preparations throughout the samples, sequencing each treatment in a different sequencing flowcell or different lanes within a flowcell (Fang and Cui, 2011). Other technical biases associated to adapter ligation and within-lane variation can be properly assessed when using biological replicates (Fang and Cui, 2011; see Table 1).

RNA-seq data can also be complemented with metabolomic data to infer the association between the differential expression of genes and the presence/absence of metabolites between wild and domesticated taxa (Hradilová et al., 2017).

After obtaining high-quality data with an appropriate experimental design, RNA-seq analyses usually follow a similar workflow, which should culminate in the detection of differentially expressed genes between a wild plant and its domesticated counterpart (Yang and Kim, 2015; see Table 1). These differentially expressed genes are most likely candidates that may explain to some degree the changes associated to domestication (Koenig et al., 2013; Hradilová et al., 2017). Nonetheless, one must be careful while interpreting the results of these studies, as some differentially expressed genes between wild and domesticated taxa may be a consequence, rather than a cause, of the domestication traits under study (Albert et al., 2012).

Successful Examples of RNA-seq Experiments to Understand Domestication

RNA-seq analysis has been successfully employed to discover differentially expressed genes involved in the domestication of several plant species. For example, RNA-seq analyses between maize and teosinte found 600 differentially expressed genes and 1,100 genes with altered patterns of co-expression, mainly involved in biotic stress responses, and many of which were previously found as candidate genes using selective scans (Myers et al., 2012). Similar results have been found in tomato (Koenig et al., 2013), pea (Hradilová et al., 2017), common bean (Singh et al., 2018), and carrot (Machaj et al., 2018). This approach has also led to the discovery of differentially expressed genes between dogs and wolves associated to tameness (Li et al., 2013), as well as changes related to the immune system and aerobic capacity (Yang et al., 2018). Another study found differential isoform expression between wild and domesticated sorghum accessions, revealing that domestication can alter the patterns of alternative spicing (Ranwez et al., 2017). Hybrid studies have been performed between maize and teosinte, suggesting potential selection on cis regulatory elements associated with changes in ear tissue and previously reported candidate genes (Lemmon et al., 2014). Another hybrid study in Capsicum annuum using network analyses revealed that loss of function in cis regulatory sequences lead to transcriptional changes in trans elements that are associated with fruit morphology (Díaz-Valenzuela et al., 2020).

Modern Epigenomics and Methodological Strategies to Explore Domestication

Epigenetics is classically defined as the heritable mechanisms that regulate gene expression without direct modifications to the DNA sequence, namely DNA methylation, RNA methylation, covalent histone modifications and chromatin assembly states (Sakurada, 2010; Zhao et al., 2017). Epigenetic variants, sometimes called epialleles, are local differences in these epigenetic marks between individuals in a population, which can have similar dynamics to genetic variants (Weigel and Colot, 2012; Guo et al., 2015). Since epigenetic mechanisms underly the ability of organisms to respond to changing environmental conditions, some epigenetic marks associated to these responses are more susceptible to change due to environmental input, while other marks involved in cell differentiation, embryonic development and core cellular functions might be more stable (Turner, 2009).

Most of the domestication studies that explain phenotypic differences between wild and domesticated taxa focus on genetic variation. However, the study of epigenomics may explain some of the missing heritability in domestication traits (i.e., the gap between the heritability of a trait estimated by classic genetics and GWAS), the patterns of differentially expressed genes that do not have clear signs of selective sweeps, or even connect the causality between the genetic variation that was selected for during domestication and the resulting phenotypes (Schmitz et al., 2013; Trerotola et al., 2015; Janowitz Koch et al., 2016; Bélteky et al., 2018).

Epigenetic variation can be inherited from one generation to the next in a process known as trans-generational epigenetic inheritance, which has been documented in plants and animals (Heard and Martienssen, 2014), even though the overall importance of this trans-generational epigenetic inheritance in plant and animal evolution is still debated (see Table 1). Nevertheless, we consider that studying epigenetic patterns associated to transcriptional activity and phenotypic traits should help understand the emergence of domestication phenotypes (Bélteky et al., 2018). If epigenetic variants such as single methylation polymorphisms (SMPs) show complete transgenerational inheritance, they can even be analyzed using the theoretical tools of population genetics to detect selective sweeps (Schmitz et al., 2013; Janowitz Koch et al., 2016).

In a similar fashion to GWAS, the use of epigenome-wide association studies (EWAS) can also reveal the association of an epigenetic variant to a trait of interest in domesticated taxa (Feeney et al., 2014). The same precautions taken in transcriptomic data should also be taken for epigenomic data, since the patterns of epigenetic marks in organisms are tissue-specific, time-dependent and sensitive to environmental input, meaning that epigenomic data should be analyzed for specific organs or tissues of interest in a controlled environment (Jensen, 2015). This is particularly important for the epigenetic marks that respond to environmental input, since domesticated taxa and their wild relatives live under different environmental conditions. Growing both taxa under controlled conditions will alter the natural state of these marks, but will also help differentiate the heritable epialleles associated to domestication traits (Turner, 2009).

Obtaining Population Data From Epigenetic Marks

The most studied epigenetic mark is DNA 5-methylcytosine, which refers to the DNA methylation in cytosines which are usually associated to transcriptional gene silencing (He et al., 2011). Cytosine methylome data can be obtained using high-throughput sequencing technologies alongside bisulfite sequencing (Meissner, 2005). Bisulfite sequencing consists in the deamination of unmethylated cytosines through a bisulfite reaction, converting them into uracil, which are encoded as thymine by sequencing technologies (Frommer et al., 1992). The comparison of sequenced DNA that was treated with bisulfite alongside sequenced DNA without treatment can discriminate between methylated and unmethylated cytosines in an organ, tissue or cell-type of interest (Frommer et al., 1992).

Reduced representation bisulfite sequencing (RRBS) is a high-throughput technique with a similar rationale to RAD-seq that enriches the sequencing of CG rich regions of the genome after the digestion of restriction enzymes (Meissner, 2005). This makes the RRBS technique a cost-effective option to analyze cytosine methylation patterns in mammals, since its cytosine DNA methylation happens at CG sites (Meissner, 2005; He et al., 2011). Plant cytosine methylomes should instead be analyzed through MethylC-seq, which consists of whole-genome sequencing and bisulfite treatment (Urich et al., 2015), as cytosine methylation can also happen in CHG and CHH sites in plant genomes (He et al., 2011). Cytosine methylation can also be detected using methylated DNA immunoprecipitation sequencing (MeDIP-seq), which consists in shearing the genomic DNA into small pieces followed by the immunoprecipitation of the methylated cytosines using antibodies that recognizes 5-methylcytosine and finally sequencing the DNA sequences with the methylated sites using standard high-throughput sequencing technologies (Weber et al., 2005).

Besides cytosine methylation, adenine has also been shown to be methylated in both plants and animals (N6-methyldeoxyadenosine), which cannot be detected using bisulfite sequencing (Luo et al., 2015). However, genomic regions with methylated adenines can be detected using N6-methyldeoxyadenosine immunoprecipitation sequencing (6mA-IP-seq), which uses the same rationale as MeDIP-seq but requires antibodies that specifically targets N6-methyldeoxyadenosine (Fu et al., 2015). PacBio and Nanopore sequencing technologies are known to be sensitive to DNA methylation, regardless of it being on a cytosine or adenine, so they are currently being used as powerful, albeit expensive tools to evaluate DNA methylation patterns in genomes (Gouil and Keniry, 2019).

Histone modifications refers to either posttranslational covalent modifications in histones (methylations, acetylations, phosphorylations, ubiquitylations, ADP-ribosylations, sumoylations, crotonylations, malonylations, succinylations) or the substitution of canonical histones by histone variants with different amino acid composition (Bowman and Poirier, 2015). These histone modifications determine the functionality of local genomic regions by changing the state of the chromatin either through its direct effects on the chemical interactions between DNA and histones or through the recruitment of chromatin remodeling complexes (Bowman and Poirier, 2015).

Chromatin immunoprecipitation sequencing (ChIP-seq) can be used to assess the genome-wide association between DNA regions and specific histone modifications (Schmidt et al., 2009). ChIP-seq consists in the initial fixation of DNA-protein interactions using formaldehyde followed by DNA fragmentation and subsequent enrichment of the target histone modification using magnetic beads coupled to antibodies in order to sequence the genomic regions where the histone modification is present (Schmidt et al., 2009). ChIP-seq can also be used to assess the interaction between any DNA-binding protein such as transcriptional factors and specific genomic regions (Schmidt et al., 2009).

Epigenomic Studies Applied to Understand Domestication

The current epigenomic analyses regarding domestication have focused on DNA methylation patterns (Jensen, 2015; Ding and Chen, 2018), but some studies have also ventured into histone modification patterns (He et al., 2014). Recent efforts are trying to connect the discoveries of genomics and epigenetics to understand the evolution of tameness in domesticated animals (Jensen, 2015). A study using RRBS that compared the DNA methylation patterns between wolves and dogs revealed signals of natural selection acting on SMPs which are enriched in transposons and genes involved in the regulation of neurotransmitters, suggesting a dog-specific silencing of genes involved in behavior (Janowitz Koch et al., 2016). Similarly, a recent study using MeDIP-seq in red junglefowl populations that were bred to have either high or low fear to humans discovered genomic region that were differentially methylated in genes that were previously related to tameness (Bélteky et al., 2018).

Other studies focused on plant domestication have found differentially methylated sites associated to domestication syndromes (Song et al., 2017; Shen et al., 2018). A study using MethylC-seq found 519 differentially methylated genes between domesticated and wild cotton from which some of them are associated with the observed differences in flowering time and seed dormancy between the wild and domesticated taxa (Song et al., 2017). Another study using MethylC-seq found 4,248 differentially methylated regions between wild and domesticated soybean and 1,164 differentially methylated regions between domesticated and improved soybean (Shen et al., 2018). As expected, the differentially methylated regions in soybean had higher genetic diversity compared to the regions with evidence of selective sweeps that were previously found, and interestingly, 22.5% of the differentially methylated sites could be associated to a causal genetic variant (suggesting that these genetic variants were responsible for the observed epigenetic patterns), whereas the rest of the differentially methylated regions could be interpreted as genuine epialleles located within genes involved in carbohydrate metabolism (Shen et al., 2018).

Experimental Validation of Candidate Genes

Once we have evidence of candidate genes involved in the domestication syndrome, the necessary next step to understand the genetic basis of domestication is to design in vitro systems, knock-out, knock-down or knock-in experiments that validate the involvement of such genes in the observed phenotypes (Zhang et al., 2017). This can be performed either by direct alteration of the genome in the organism of interest, by using RNA interference or by designing heterologous systems in a model organism (Boettcher and McManus, 2015). As an example, a knock-out experiment with backcrosses between domesticated and wild mice elucidated the role of some genes involved in behavioral changes associated to mouse domestication (Chalfin et al., 2014).

Previous knock-out and knock-in experiments were restricted to model organisms, but nowadays experimental validation of candidate genes can be supported via knock-out and knock-in experiments, using novel genome editing tools (e.g., Shalem et al., 2014; Hahn et al., 2017; Ueta et al., 2017). Genome-editing tools are already available for a broad range of taxa, including dozens of crop species, but developing a working system in non-model organisms can still be a difficult task that can take several months or even years to accomplish (Shan et al., 2020), so doing collaborative studies alongside experimental researchers is recommended. In this moment, the leading toolset to perform genome editing is the Clustered Regulatory Interspaced Short Palindromic Repeats (CRISPR) system alongside the CRISPR associated protein 9 (Cas9), commonly known as CRISPR/Cas9, which can be used to eliminate, introduce or replace specific segments of DNA within a targeted site in a genome (Cong et al., 2013). Another useful tool for genome editing is the Transcription Activator-Like Effector Nuclease (TALEN) technology, which has its own advantages in comparison to CRISPR/CAS9 (Zhang et al., 2017). RNA interference can also help in validating the function of candidate genes, although it is limited to knock-down experiments (Boettcher and McManus, 2015). Heterologous expression in model organisms is a cost-effective alternative to validate candidate genes (e.g., Schweiger et al., 2010), although this method overlooks the interaction networks that exist in vivo which are accountable for the emergence of phenotypes (Rodríguez-Mega et al., 2015).

Regardless the genome-editing tool of choice (Boettcher and McManus, 2015; Zhang et al., 2017), genome edition is proving its usefulness to validate the effect of candidate genes involved in domestication through the introduction of domesticated alleles on wild relatives and vice-versa (Zhou J. et al., 2019), which can prove that the gene is indeed involved in the appearance of the domesticated phenotype (Zhou J. et al., 2019). This can be performed in the same way as a usual knock-out or knock-in experiment, where the edited locus must be validated through PCR and Sanger sequencing, a PCR-RFLP analysis or using Western-blot in case of a protein knock-out (e.g., Ueta et al., 2017). The expected result of these type of studies is to find a modified phenotype after editing a candidate locus, either a wild individual with a domesticated-like phenotypic trait or a domesticated individual with a wild-like phenotypic trait (Zhou J. et al., 2019).

Of course, the above studies will hardly reproduce a complete domesticated or wild phenotype, since genetic elements interact in complex regulatory networks, including other elements within the genome as well as epigenetic and environmental components (Rodríguez-Mega et al., 2015), but nonetheless will be useful to understand the role of those genes in the emergence of domesticated phenotypes.

Once the candidate genes are validated, genome-editing tools can also become useful to introduce desirable traits from wild relatives to its domesticated counterparts, a goal of great interest for crop improvement (Zhou J. et al., 2019) and currently used to accelerate plant breeding and to fine-tune desirable traits (Wolter et al., 2019). Furthermore, recent efforts are trying to domesticate plant crops de novo by inserting the desired domestication alleles into their wild relatives, generating crops with the desired domestication phenotypes but without the problems of low genetic variation and accumulation of deleterious mutations that are an inevitable consequence of regular domestication processes (Fernie and Yan, 2019).

Conclusion and Perspectives

Plant and animal domestication can be studied using genomic, transcriptomic and epigenomic strategies, revealing the action of evolutionary, ecological and anthropogenic processes (Kantar et al., 2017). These tools can lead us beyond the description of the possible historical scenarios that shaped the domesticated species, since we can explore the effects of domestication on the transcriptomic activity of a species (Hekman et al., 2015), test the validity of candidate genes associated to domestication phenotypes (Zhou J. et al., 2019) and analyze epigenetic patterns associated to domestication traits (Jensen, 2015). Many domesticated taxa remain genetically unexplored, and as sequencing technologies become cheaper and more efficient, domestication genomics will soon be available for polyploids and species with huge genomes (e.g., Edger et al., 2019).

Nonetheless, the modern study of domestication of plants and animals should still be multidisciplinary, since genetics only tells us part of the story (Larson et al., 2014). An extended synthesis framework should also be considered to understand domestication, as these new studies are helping us understand niche construction and the emergence of domesticated phenotypes (Piperno, 2017). Other potential lines of work remain to be addressed in domestication studies, such as the changes in the chromatin architecture (e.g., Concia et al., 2020), the use of comparative proteomic atlases (e.g., Jiang Y. et al., 2019) and the analysis of cell-type divergences during development using single-cell RNA-seq data (Arendt et al., 2016). The use of this multi-omic approaches will help us create and compare developmental atlases (e.g., Walley et al., 2016) between wild and domesticated taxa to understand how morphology diverged during domestication.

Statements

Author contributions

JB-R, DP, and LE wrote the manuscript. All authors contributed to the article and approved the submitted version.

Funding

This study was funded by Comisión Nacional para el Conocimiento y Uso de la Biodiversidad (CONABIO) KE004 “Diversidad genética de las especies de Cucurbita en México e hibridación entre plantas genéticamente modificadas y especies silvestres de Cucurbita” and CONABIO PE001 “Diversidad genética de las especies de Cucurbita en México. Fase II. Genómica evolutiva y de poblaciones, recursos genéticos y domesticación, both awarded to Rafael Lira-Saade and LE. DP was funded by Instituto de Ecología, UNAM”. JB-R is a doctoral student from Programa de Doctorado en Ciencias Biomédicas, Universidad Nacional Autónoma de México and received fellowship 583146 from Consejo Nacional de Ciencia y Tecnología (CONACyT).

Acknowledgments

We acknowledge the Doctorado en Ciencias Biomédicas for the academic support provided to JB-R during the development of this project. We thank Alejandra Moreno Letelier for her insights, which greatly improved the quality of the manuscript.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2020.00742/full#supplementary-material

References

1
AlachiotisN.PavlidisP. (2018). RAiSD detects positive selection based on multiple signatures of a selective sweep and SNP vectors.Commun. Biol.11–11. 10.1038/s42003-018-0085-8
2
AlachiotisN.StamatakisA.PavlidisP. (2012). OmegaPlus: a scalable tool for rapid detection of selective sweeps in whole-genome datasets.Bioinformatics282274–2275. 10.1093/bioinformatics/bts419
3
AlbertF. W.SomelM.CarneiroM.Aximu-PetriA.HalbwaxM.ThalmannO.et al (2012). A comparison of brain gene expression levels in domesticated and wild animals.PLoS Genet.8:e1002962. 10.1371/journal.pgen.1002962
4
AlbertoF. J.BoyerF.Orozco-terWengelP.StreeterI.ServinB.de VillemereuilP.et al (2018). Convergent genomic signatures of domestication in sheep and goats.Nat. Commun.9:813. 10.1038/s41467-018-03206-y
5
AlexanderD. H.NovembreJ.LangeK. (2009). Fast model-based estimation of ancestry in unrelated individuals.Genome Res.191655–1664. 10.1101/gr.094052.109
6
AliO. A.O’RourkeS. M.AmishS. J.MeekM. H.LuikartG.JeffresC.et al (2016). RAD capture (Rapture): flexible and efficient sequence-based genotyping.Genetics202389–400. 10.1534/genetics.115.183665
7
AllabyR. G.WareR. L.KistlerL. (2019). A re−evaluation of the domestication bottleneck from archaeogenomic evidence.Evol. Appl.1229–37. 10.1111/eva.12680
8
AndersonC. N. K.RamakrishnanU.ChanY. L.HadlyE. A. (2005). Serial SimCoal: a population genetics model for data from multiple populations and points in time.Bioinformatics211733–1734. 10.1093/bioinformatics/bti154
9
AndrewsK. R.GoodJ. M.MillerM. R.LuikartG.HohenloheP. A. (2016). Harnessing the power of RADseq for ecological and evolutionary genomics.Nat. Rev. Genet.1781–92. 10.1038/nrg.2015.28
10
ArendtD.MusserJ. M.BakerC. V.BergmanA.CepkoC.ErwinD. H.et al (2016). The origin and evolution of cell types.Nat. Rev. Genet.17744–757. 10.1038/nrg.2016.127
11
BadouinH.GouzyJ.GrassaC. J.MuratF.StatonS. E.CottretL.et al (2017). The sunflower genome provides insights into oil metabolism, flowering and Asterid evolution.Nature546148–152. 10.1038/nature22380
12
BaranY.PasaniucB.SankararamanS.TorgersonD. G.GignouxC.EngC.et al (2012). Fast and accurate inference of local ancestry in Latino populations.Bioinformatics281359–1367. 10.1093/bioinformatics/bts144
13
BauteG. J.KaneN. C.GrassaC. J.LaiZ.RiesebergL. H. (2015). Genome scans reveal candidate domestication and improvement genes in cultivated sunflower, as well as post-domestication introgression with wild relatives.New Phytol.206830–838. 10.1111/nph.13255
14
BellG. D.KaneN. C.RiesebergL. H.AdamsK. L. (2013). RNA-seq analysis of allele-specific expression, hybrid effects, and regulatory divergence in hybrids compared with their parents from natural populations.Genome Biol. Evol.51309–1323. 10.1093/gbe/evt072
15
BelserC.IstaceB.DenisE.DubarryM.BaurensF. C.FalentinC.et al (2018). Chromosome-scale assemblies of plant genomes using nanopore long reads and optical maps.Nat. Plants4879–887. 10.1038/s41477-018-0289-4
16
BéltekyJ.AgnvallB.BekticL.HöglundA.JensenP.Guerrero-BosagnaC. (2018). Epigenetics and early domestication: differences in hypothalamic DNA methylation between red junglefowl divergently selected for high or low fear of humans.Genet. Sel. Evol.50:13. 10.1186/s12711-018-0384-z
17
BertoliniF.ServinB.TalentiA.RochatE.KimE. S.OgetC.et al (2018). Signatures of selection and environmental adaptation across the goat genome post-domestication.Genet. Sel. Evol.50:57. 10.1186/s12711-018-0421-y
18
BickhartD. M.RosenB. D.KorenS.SayreB. L.HastieA. R.ChanS.et al (2017). Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome.Nat. Genet.49643–650. 10.1038/ng.3802
19
BoettcherM.McManusM. T. (2015). Choosing the right tool for the job: RNAi, TALEN, or CRISPR.Mol. Cell58575–585. 10.1016/j.molcel.2015.04.028
20
BombaL.NicolazziE. L.MilanesiM.NegriniR.ManciniG.BiscariniF.et al (2015). Relative extended haplotype homozygosity signals across breeds reveal dairy and beef specific signatures of selection.Genet. Sel. Evol.47:25. 10.1186/s12711-015-0113-9
21
BonhommeM.ChevaletC.ServinB.BoitardS.AbdallahJ.BlottS.et al (2010). Detecting selection in population trees: the lewontin and krakauer test extended.Genetics186241–262. 10.1534/genetics.104.117275
22
BowmanG. D.PoirierM. G. (2015). Post-translational modifications of histones that influence nucleosome dynamics.Chem. Rev.1152274–2295. 10.1021/cr500350x
23
BrohammerA. B.KonoT. J. Y.HirschC. N. (2018). “The maize pan-genome,” in The Maize Genome. Compendium of Plant Genomes, edsBennetzenJ.Flint-GarciaS.HirschC.TuberosaR. (Cham: Springer), 13–29. 10.1007/978-3-319-97427-9_2
- CrossRef
- Google Scholar
24
BrozynskaM.FurtadoA.HenryR. J. (2016). Genomics of crop wild relatives: expanding the gene pool for crop improvement.Plant Biotechnol. J.141070–1085. 10.1111/pbi.12454
25
BryantD. M.JohnsonK.DiTommasoT.TickleT.CougerM. B.Payzin-DogruD.et al (2017). A tissue-mapped axolotl de novo transcriptome enables identification of limb regeneration factors.Cell Rep.18762–776. 10.1016/j.celrep.2016.12.063
26
BurdenC. J.QureshiS. E.WilsonS. R. (2014). Error estimates for the analysis of differential expression from RNA-seq count data.PeerJ2:e576. 10.7717/peerj.576
27
BurgarellaC.CubryP.KaneN. A.VarshneyR. K.MariacC.LiuX.et al (2018). A western Sahara centre of domestication inferred from pearl millet genomes.Nat. Ecol. Evol.21377–1380. 10.1038/s41559-018-0643-y
28
BurggrenW. (2016). Epigenetic inheritance and its role in evolutionary biology: re-evaluation and new perspectives.Biology (Basel).5:24. 10.3390/biology5020024
29
CaoK.LiY.DengC. H.GardinerS. E.ZhuG.FangW.et al (2019). Comparative population genomics identified genomic regions and candidate genes associated with fruit domestication traits in peach.Plant Biotechnol. J.171954–1970. 10.1111/pbi.13112
30
CatchenJ.HohenloheP.BasshamS.AmoresA.CreskoW. (2013). Stacks: an analysis tool set for population genomics.Mol. Ecol.223124–3140. 10.1016/j.biotechadv.2011.08.021.Secreted
- CrossRef
- Google Scholar
31
ChalfinL.DayanM.LevyD. R.AustadS. N.MillerR. A.IraqiF. A.et al (2014). Mapping ecologically relevant social behaviours by gene knockout in wild mice.Nat. Commun.5:4569. 10.1038/ncomms5569
32
CharlesworthB. (2009). Effective population size and patterns of molecular evolution and variation.Nat. Rev. Genet.10195–205. 10.1038/nrg2526
33
ChenH.PattersonN.ReichD. (2010). Population differentiation as a test for selective sweeps.Genome Res.20393–402. 10.1101/gr.100545.109
34
ChenJ.NiP.LiX.HanJ.JakovliæI.ZhangC.et al (2018). Population size may shape the accumulation of functional mutations following domestication.BMC Evol. Biol.18:4. 10.1186/s12862-018-1120-6
35
ChenN.CaiY.ChenQ.LiR.WangK.HuangY.et al (2018). Whole-genome resequencing reveals world-wide ancestry and adaptive introgression events of domesticated cattle in East Asia.Nat. Commun.9:2337. 10.1038/s41467-018-04737-0
36
ClarkJ. W.DonoghueP. C. J. (2018). Whole-genome duplication and plant macroevolution.Trends Plant Sci.23933–945. 10.1016/j.tplants.2018.07.006
37
ConciaL.VeluchamyA.Ramirez-PradoJ. S.Martin-RamirezA.HuangY.PerezM.et al (2020). Wheat chromatin architecture is organized in genome territories and transcription factories.Genome Biol.211–20. 10.1186/s13059-020-01998-1
38
CongL.RanF. A.CoxD.LinS.BarrettoR.HabibN.et al (2013). Multiplex genome engineering using CRISPR/Cas systems.Science339819–823. 10.1126/science.1231143
39
CornuetJ.-M.PudloP.VeyssierJ.Dehne-GarciaA.GautierM.LebloisR.et al (2014). DIYABC v2.0: a software to make approximate Bayesian computation inferences about population history using single nucleotide polymorphism, DNA sequence and microsatellite data.Bioinformatics301187–1189. 10.1093/bioinformatics/btt763
40
DalyK. G.Maisano DelserP.MullinV. E.ScheuA.MattiangeliV.TeasdaleM. D.et al (2018). Ancient goat genomes reveal mosaic domestication in the Fertile Crescent.Science36185–88. 10.1126/science.aas9411
41
DaveyJ. W.BlaxterM. L. (2010). RADSeq?: next-generation population genetics.Brief. Funct. Genomics9416–423. 10.1093/bfgp/elq031
42
De MitaS.ThuilletA.-C.GayL.AhmadiN.ManelS.RonfortJ.et al (2013). Detecting selection along environmental gradients: analysis of eight methods and their effectiveness for outbreeding and selfing populations.Mol. Ecol.221383–1399. 10.1111/mec.12182
43
de VillemereuilP.GaggiottiO. E. (2015). A new F ST -based method to uncover local adaptation using environmental variables.Methods Ecol. Evol.61248–1258. 10.1111/2041-210X.12418
- CrossRef
- Google Scholar
44
De WitP.PespeniM. H.LadnerJ. T.BarshisD. J.SenecaF.JarisH.et al (2012). The simple fool’s guide to population genomics via RNA-Seq: an introduction to high-throughput sequencing data analysis.Mol. Ecol. Resour.121058–1067. 10.1111/1755-0998.12003
45
DelaneauO.HowieB.CoxA. J.ZaguryJ.-F.MarchiniJ. (2013). Haplotype estimation using sequencing reads.Am. J. Hum. Genet.93687–696. 10.1016/j.ajhg.2013.09.002
46
DiaoL.ChenK. C. (2012). Local ancestry corrects for population structure in Saccharomyces cerevisiae genome-wide association studies.Genetics1921503–1511. 10.1534/genetics.112.144790
47
Dias-AlvesT.MairalJ.BlumM. G. B. (2018). Loter: a software package to infer local ancestry for a wide range of species.Mol. Biol. Evol.352318–2326. 10.1093/molbev/msy126
48
Díaz-ValenzuelaE.SawersR. H.Cibrián-JaramilloA. (2020). Cis-and trans-regulatory variations in the domestication of the chili pepper fruit.Mol. Biol. Evol.371593–1603. 10.1093/molbev/msaa027
49
DingM.ChenZ. J. (2018). Epigenetic perspectives on the evolution and domestication of polyploid plant and crops.Curr. Opin. Plant Biol.4237–48. 10.1016/j.pbi.2018.02.003
50
DoebleyJ.StecA. (1991). Genetic analysis of the morphological differences between maize and teosinte.Genetics129285–295.
- Google Scholar
51
DoebleyJ.StecA.GustusC. (1995). teosinte branched1 and the origin of maize: evidence for epistasis and the evolution of dominance.Genetics141333–346.
- Google Scholar
52
DongY.XieM.JiangY.XiaoN.DuX.ZhangW.et al (2013). Sequencing and automated whole-genome optical mapping of the genome of a domestic goat (Capra hircus).Nat. Biotechnol.31135–141. 10.1038/nbt.2478
53
DorantY.BenestanL.RougemontQ.NormandeauE.BoyleB.RochetteR.et al (2019). Comparing Pool−seq, Rapture, and GBS genotyping for inferring weak population structure: the American lobster (Homarus americanus) as a case study.Ecol. Evol.96606–6623. 10.1002/ece3.5240
54
DurandE. Y.PattersonN.ReichD.SlatkinM. (2011). Testing for ancient admixture between closely related populations.Mol. Biol. Evol.282239–2252. 10.1093/molbev/msr048
55
EdelmanN. B.FrandsenP. B.MiyagiM.ClavijoB.DaveyJ.DikowR. B.et al (2019). Genomic architecture and introgression shape a butterfly radiation.Science366594–599. 10.1126/science.aaw2090
56
EdgerP. P.PoortenT. J.VanBurenR.HardiganM. A.ColleM.McKainM. R.et al (2019). Origin and evolution of the octoploid strawberry genome.Nat. Genet.51541–547. 10.1038/s41588-019-0356-4
57
EllegrenH. (2014). Genome sequencing and population genomics in non-model organisms.Trends Ecol. Evol.2951–63. 10.1016/j.tree.2013.09.008
58
ExcoffierL.FollM. (2011). fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios.Bioinformatics271332–1334. 10.1093/bioinformatics/btr124
59
ExcoffierL.HoferT.FollM. (2009). Detecting loci under selection in a hierarchically structured population.Heredity (Edinb).103:285. 10.1038/hdy.2009.74
60
FangZ.CuiX. (2011). Design and validation issues in RNA-seq experiments.Brief. Bioinform.12280–287. 10.1093/bib/bbr004
61
FayJ. C.WuC. I. (2000). Hitchhiking under positive Darwinian selection.Genetics1551405–1413.
- Google Scholar
62
FeeneyA.NilssonE.SkinnerM. K. (2014). Epigenetics and transgenerational inheritance in domesticated farm animals.J. Anim. Sci. Biotechnol.5:48. 10.1186/2049-1891-5-48
63
FernieA. R.YanJ. (2019). De novo domestication: an alternative route toward new crops for the future.Mol. Plant12615–631. 10.1016/j.molp.2019.03.016
64
FierstJ. L. (2015). Using linkage maps to correct and scaffold de novo genome assemblies: methods, challenges, and computational tools.Front. Genet.6:220. 10.3389/fgene.2015.00220
65
Fitz-GibbonS.HippA. L.PhamK. K.ManosP. S.SorkV. L. (2017). Phylogenomic inferences from reference-mapped and de novo assembled short-read sequence data using RADseq sequencing of California white oaks (Quercus section Quercus).Genome60743–755. 10.1139/gen-2016-0202
66
FollM.GaggiottiO. (2008). A genome-scan method to identify selected loci appropriate for both dominant and codominant markers: a Bayesian perspective.Genetics180977–993. 10.1534/genetics.108.092221
67
FrantzL. A. F.MullinV. E.Pionnier-CapitanM.LebrasseurO.OllivierM.PerriA.et al (2016). Genomic and archaeological evidence suggest a dual origin of domestic dogs.Science3521228–1231. 10.1126/science.aaf3161
68
FrantzL. A. F.SchraiberJ. G.MadsenO.MegensH.-J.CaganA.BosseM.et al (2015). Evidence of long-term gene flow and selection during domestication from analyses of Eurasian wild and domestic pig genomes.Nat. Genet.471141–1148. 10.1038/ng.3394
69
FrommerM.McDonaldL. E.MillarD. S.CollisC. M.WattF.GriggG. W.et al (1992). A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands.Proc. Natl. Acad. Sci. U.S.A.891827–1831. 10.1073/pnas.89.5.1827
70
FuY.LuoG.-Z.ChenK.DengX.YuM.HanD.et al (2015). N6-methyldeoxyadenosine marks active transcription start sites in chlamydomonas.Cell161879–892. 10.1016/j.cell.2015.04.010
71
FugèreV.HendryA. P. (2018). Human influences on the strength of phenotypic selection.Proc. Natl. Acad. Sci. U.S.A.11510070–10075. 10.1073/pnas.1806013115
72
FumagalliM. (2013). Assessing the effect of sequencing depth and sample size in population genetics inferences.PLoS One8:e79667. 10.1371/journal.pone.0079667
73
FutschikA.SchlöttererC. (2010). The next generation of molecular markers from massively parallel sequencing of pooled DNA samples.Genetics186207–218. 10.1534/genetics.110.114397
74
GambaC.HanghøjK.GaunitzC.AlfarhanA. H.AlquraishiS. A.Al-RasheidK. A. S.et al (2016). Comparing the performance of three ancient DNA extraction methods for high-throughput sequencing.Mol. Ecol. Resour.16459–469. 10.1111/1755-0998.12470
75
GansaugeM.-T.GerberT.GlockeI.KorleviæP.LippikL.NagelS.et al (2017). Single-stranded DNA library preparation from highly degraded DNA using T4 DNA ligase.Nucleic Acids Res.45:gkx033. 10.1093/nar/gkx033
76
GaoL.GondaI.SunH.MaQ.BaoK.TiemanD. M.et al (2019). The tomato pan-genome uncovers new genes and a rare allele regulating fruit flavor.Nat. Genet.511044–1051. 10.1038/s41588-019-0410-2
77
GarudN. R.MesserP. W.BuzbasE. O.PetrovD. A. (2015). Recent selective sweeps in North American Drosophila melanogaster show signatures of soft sweeps.PLoS Genet.11:e1005004. 10.1371/journal.pgen.1005004
78
GeptsP. (2014). The contribution of genetic and genomic approaches to plant domestication studies.Curr. Opin. Plant Biol.1851–59. 10.1016/j.pbi.2014.02.001
79
GerbaultP.AllabyR. G.BoivinN.RudzinskiA.GrimaldiI. M.PiresJ. C.et al (2014). Storytelling and story testing in domestication.Proc. Natl. Acad. Sci. U.S.A.1116159–6164. 10.1073/pnas.1400425111
80
GibsonG. (2018). Population genetics and GWAS: a primer.PLoS Biol.16:e2005485. 10.1371/journal.pbio.2005485
81
GoliczA. A.BatleyJ.EdwardsD. (2016a). Towards plant pangenomics.Plant Biotechnol. J.141099–1105. 10.1111/pbi.12499
82
GoliczA. A.BayerP. E.BarkerG. C.EdgerP. P.KimH.MartinezP. A.et al (2016b). The pangenome of an agronomically important crop plant Brassica oleracea.Nat. Commun.71–8. 10.1038/ncomms13390
83
GouilQ.KeniryA. (2019). Latest techniques to study DNA methylation.Essays Biochem.63639–648. 10.1042/EBC20190027
84
GroeneveldL. F.LenstraJ. A.EdingH.ToroM. A.ScherfB.PillingD.et al (2010). Genetic diversity in farm animals – a review.Anim. Genet.416–31. 10.1111/j.1365-2052.2010.02038.x
85
Guerra GarcíaA.PiñeroD. (2017). Current approaches and methods in plant domestication studies.Bot. Sci.95:345. 10.17129/botsci.1209
- CrossRef
- Google Scholar
86
Guerra-GarcíaA.Suárez-AtilanoM.Mastretta-YanesA.Delgado-SalinasA.PiñeroD. (2017). Domestication genomics of the open-pollinated scarlet runner bean (Phaseolus coccineus L.).Front. Plant Sci.8:1891. 10.3389/fpls.2017.01891
87
Guerrero-BosagnaC. (2012). Finalism in darwinian and lamarckian evolution: lessons from epigenetics and developmental biology.Evol. Biol.39283–300. 10.1007/s11692-012-9163-x
- CrossRef
- Google Scholar
88
GüntherT.NettelbladC. (2019). The presence and impact of reference bias on population genomic studies of prehistoric human populations.PLoS Genet.15:e1008302. 10.1371/journal.pgen.1008302
89
GuoS.ZhangJ.SunH.SalseJ.LucasW. J.ZhangH.et al (2012). The draft genome of watermelon (Citrullus lanatus) and resequencing of 20 diverse accessions.Nat. Genet.4551–58. 10.1038/ng.2470
90
GuoZ.SongG.LiuZ.QuX.ChenR.JiangD.et al (2015). Global epigenomic analysis indicates that Epialleles contribute to Allele-specific expression via Allele-specific histone modifications in hybrid rice.BMC Genomics16:232. 10.1186/s12864-015-1454-z
91
GutenkunstR. N.HernandezR. D.WilliamsonS. H.BustamanteC. D. (2009). Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data.PLoS Genet.5:e1000695. 10.1371/journal.pgen.1000695
92
HaasB. J.PapanicolaouA.YassourM.GrabherrM.BloodP. D.BowdenJ.et al (2013). De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis.Nat. Protoc.81494–1512. 10.1038/nprot.2013.084
93
HahnF.EisenhutM.MantegazzaO.WeberA. (2017). Generation of targeted knockout mutants in Arabidopsis thaliana using CRISPR/Cas9.Bio Protoc.71–20. 10.21769/BioProtoc.2384
- CrossRef
- Google Scholar
94
HahnM. W.HibbinsM. S. (2019). A three-sample test for introgression.Mol. Biol. Evol.362878–2882. 10.1093/molbev/msz178
95
HanL.AbneyM. (2013). Using identity by descent estimation with dense genotype data to detect positive selection.Eur. J. Hum. Genet.21205–211. 10.1038/ejhg.2012.148
96
HanchardN. A.RockettK. A.SpencerC.CoopG.PinderM.JallowM.et al (2006). Screening for recently selected alleles by analysis of human haplotype similarity.Am. J. Hum. Genet.78153–159. 10.1086/499252
97
HeS.YanS.WangP.ZhuW.WangX.ShenY.et al (2014). Comparative analysis of genome-wide chromosomal histone modification patterns in maize cultivars and their wild relatives.PLoS One9:e97364. 10.1371/journal.pone.0097364
98
HeX.-J.ChenT.ZhuJ.-K. (2011). Regulation and function of DNA methylation in plants and animals.Cell Res.21442–465. 10.1038/cr.2011.23
99
HeardE.MartienssenR. A. (2014). Transgenerational epigenetic inheritance: myths and mechanisms.Cell15795–109. 10.1016/j.cell.2014.02.045
100
HekmanJ. P.JohnsonJ. L.KukekovaA. V. (2015). Transcriptome analysis in domesticated species: challenges and strategies.Bioinform. Biol. Insights9(Suppl. 4) 21–31. 10.4137/BBI.S29334
101
HeledJ.DrummondA. J. (2008). Bayesian inference of population size history from multiple loci.BMC Evol. Biol.8:289. 10.1186/1471-2148-8-289
102
HradilováI.TrnìnıO.VálkováM.CechováM.JanskáA.ProkešováL.et al (2017). A combined comparative transcriptomic, metabolomic, and anatomical analyses of two key domestication traits: pod dehiscence and seed dormancy in pea (Pisum sp.).Front. Plant Sci.8:542. 10.3389/fpls.2017.00542
103
HuangX.KurataN.WeiX.WangZ.-X.WangA.ZhaoQ.et al (2012). A map of rice genome variation reveals the origin of cultivated rice.Nature490497–501. 10.1038/nature11532
104
HübnerS.BercovichN.TodescoM.MandelJ. R.OdenheimerJ.ZieglerE.et al (2019). Sunflower pan-genome analysis shows that hybridization altered gene content and disease resistance.Nat. Plants554–62. 10.1038/s41477-018-0329-0
105
HuffordM. B.XuX.van HeerwaardenJ.PyhäjärviT.ChiaJ.-M.CartwrightR. A.et al (2012). Comparative population genomics of maize domestication and improvement.Nat. Genet.44808–811. 10.1038/ng.2309
106
HurgobinB.GoliczA. A.BayerP. E.ChanC. K. K.TirnazS.DolatabadianA.et al (2018). Homoeologous exchange is a major cause of gene presence/absence variation in the amphidiploid Brassica napus.Plant Biotechnol. J.161265–1274. 10.1111/pbi.12867
107
Ibarra-LacletteE.LyonsE.Hernández-GuzmánG.Pérez-TorresC. A.Carretero-PauletL.ChangT. H.et al (2013). Architecture and evolution of a minute plant genome.Nature49894–98. 10.1038/nature12132
108
IlskaJ.HaskellM. J.BlottS. C.Sánchez-MolanoE.PolgarZ.LofgrenS. E.et al (2017). Genetic characterization of dog personality traits.Genetics2061101–1111. 10.1534/genetics.116.192674
109
InbarS.CohenP.YahavT.PrivmanE. (2020). Comparative study of population genomic approaches for mapping colony-level traits.PLoS Comput. Biol.16:e1007653. 10.1371/journal.pcbi.1007653
110
Irving-PeaseE. K.RyanH.JamiesonA.DimopoulosE. A.LarsonG.FrantzL. A. F. (2019). “Paleogenomics of animal domestication,” in Paleogenomics: Genome-Scale Analysis of Ancient DNA, edsLindqvistC.RajoraO. P. (Cham: Springer International Publishing), 225–272. 10.1007/13836_2018_55
- CrossRef
- Google Scholar
111
Janowitz KochI.ClarkM. M.ThompsonM. J.Deere-MachemerK. A.WangJ.DuarteL.et al (2016). The concerted impact of domestication and transposon insertions on methylation patterns between dogs and grey wolves.Mol. Ecol.251838–1855. 10.1111/mec.13480
112
JanzenG. M.WangL.HuffordM. B. (2019). The extent of adaptive wild introgression in crops.New Phytol.2211279–1288. 10.1111/nph.15457
113
JensenP. (2015). Adding “epi-” to behaviour genetics: implications for animal domestication.J. Exp. Biol.21832–40. 10.1242/jeb.106799
114
JiangL. G.LiB.LiuS. X.WangH. W.LiC. P.SongS. H.et al (2019). Characterization of proteome variation during modern maize breeding.Mol. Cell. Proteomics18263–276. 10.1074/mcp.RA118.001021
115
JiangY.JiangY.WangS.ZhangQ.DingX. (2019). Optimal sequencing depth design for whole genome re-sequencing in pigs.BMC Bioinform.20:556. 10.1186/s12859-019-3164-z
116
JiaoY.ZhaoH.RenL.SongW.ZengB.GuoJ.et al (2012). Genome-wide genetic changes during modern breeding of maize.Nat. Genet.44812–815. 10.1038/ng.2312
117
JombartT.DevillardS.BallouxF. (2010). Discriminant analysis of principal components: a new method for the analysis of genetically structured populations.BMC Genet.11:94. 10.1186/1471-2156-11-94
118
JordeL. B. (2001). Population genomics: a bridge from evolutionary history to genetic medicine.Hum. Mol. Genet.102199–2207. 10.1093/hmg/10.20.2199
119
KantarM. B.NashobaA. R.AndersonJ. E.BlackmanB. K.RiesebergL. H. (2017). The genetics and genomics of plant domestication.Bioscience67971–982. 10.1093/biosci/bix114
- CrossRef
- Google Scholar
120
KaurP.GaikwadK. (2017). From genomes to GENE-omes: exome sequencing concept and applications in crop improvement.Front. Plant Sci.8:2164. 10.3389/fpls.2017.02164
121
KhanA. W.GargV.RoorkiwalM.GoliczA. A.EdwardsD.VarshneyR. K. (2020). Super-pangenome by integrating the wild side of a species for accelerated crop improvement.Trends. Plant Sci.25148–158. 10.1016/j.tplants.2019.10.012
122
KhanM. A.OlsenK. M.SoveroV.KushadM. M.KorbanS. S. (2014). Fruit quality traits have played critical roles in domestication of the apple.Plant Genome71–18. 10.3835/plantgenome2014.04.0018
- CrossRef
- Google Scholar
123
KimY.NielsenR. (2004). Linkage disequilibrium as a signature of selective sweeps.Genetics1671513–1524. 10.1534/genetics.103.025387
124
KoenigD.Jimenez-GomezJ. M.KimuraS.FulopD.ChitwoodD. H.HeadlandL. R.et al (2013). Comparative transcriptomics reveals patterns of selection in domesticated and wild tomato.Proc. Natl. Acad. Sci. U.S.A.110E2655–E2662. 10.1073/pnas.1309606110
125
KorneliussenT. S.AlbrechtsenA.NielsenR. (2014). ANGSD: analysis of next generation sequencing data.BMC Bioinform.15:356. 10.1186/s12859-014-0356-4
126
KorneliussenT. S.MoltkeI.AlbrechtsenA.NielsenR. (2013). Calculation of Tajima’s D and other neutrality test statistics from low depth next-generation sequencing data.BMC Bioinform.14:289. 10.1186/1471-2105-14-289
127
LarsonG.PipernoD. R.AllabyR. G.PuruggananM. D.AnderssonL.Arroyo-KalinM.et al (2014). Current perspectives and the future of domestication studies.Proc. Natl. Acad. Sci. U.S.A.1116139–6146. 10.1073/pnas.1323964111
128
LelieveldS. H.SpielmannM.MundlosS.VeltmanJ. A.GilissenC. (2015). Comparison of exome and genome sequencing technologies for the complete capture of protein-coding regions.Hum. Mutat.36815–822. 10.1002/humu.22813
129
LemmonZ. H.BukowskiR.SunQ.DoebleyJ. F. (2014). The role of Cis regulatory evolution in maize domestication.PLoS Genet.10:e1004745. 10.1371/journal.pgen.1004745
130
LevyS. E.MyersR. M. (2016). Advancements in next-generation sequencing.Annu. Rev. Genomics Hum. Genet.1795–115. 10.1146/annurev-genom-083115-022413
131
LiH.DurbinR. (2011). Inference of human population history from individual whole-genome sequences.Nature475493–496. 10.1038/nature10231
132
LiR.FuW.SuR.TianX.DuD.ZhaoY.et al (2019). Towards the complete goat pan-genome by recovering missing genomic segments from the reference genome.Front. Genet.10:1169. 10.3389/fgene.2019.01169
133
LiY.Von HoldtB. M.ReynoldsA.BoykoA. R.WayneR. K.WuD. D.et al (2013). Artificial selection on brain-expressed genes during the domestication of dog.Mol. Biol. Evol.301867–1876. 10.1093/molbev/mst088
134
LiY.ZhouG.MaJ.JiangW.JinL.ZhangZ.et al (2014). De novo assembly of soybean wild relatives for pan-genome analysis of diversity and agronomic traits.Nat. Biotechnol.321045–1052. 10.1038/nbt.2979
135
LiZ.ChenJ.WangZ.PanY.WangQ.XuN.et al (2016). Detection of selection signatures of population-specific genomic regions selected during domestication process in Jinhua pigs.Anim. Genet.47672–681. 10.1111/age.12475
136
LibradoP.FagesA.GaunitzC.LeonardiM.WagnerS.KhanN.et al (2016). The evolutionary origin and genetic makeup of domestic horses.Genetics204423–434. 10.1534/genetics.116.194860
137
LinckE.BatteyC. J. (2019). Minor allele frequency thresholds strongly affect population structure inference with genomic data sets.Mol. Ecol. Resour.19639–647. 10.1111/1755-0998.12995
138
LiuX.FuY.-X. (2015). Exploring population size changes using SNP frequency spectra.Nat. Genet.47555–559. 10.1038/ng.3254
139
LotterhosK. E.WhitlockM. C. (2014). Evaluation of demographic history and neutral parameterization on the performance of F_ST outlier tests.Mol. Ecol.232178–2192. 10.1111/mec.12725
140
LotterhosK. E.WhitlockM. C. (2015). The relative power of genome scans to detect local adaptation depends on sampling design and statistical method.Mol. Ecol.241031–1046. 10.1111/mec.13100
141
LowryD. B.HobanS.KelleyJ. L.LotterhosK. E.ReedL. K.AntolinM. F.et al (2017). Breaking RAD: an evaluation of the utility of restriction site-associated DNA sequencing for genome scans of adaptation.Mol. Ecol. Resour.17142–152. 10.1111/1755-0998.12635
142
LuoG.-Z.BlancoM. A.GreerE. L.HeC.ShiY. (2015). DNA N6-methyladenine: a new epigenetic mark in eukaryotes?Nat. Rev. Mol. Cell Biol.16705–710. 10.1038/nrm4076
143
LuuK.BazinE.BlumM. G. B. (2017). pcadapt?: an R package to perform genome scans for selection based on principal component analysis.Mol. Ecol. Resour.1767–77. 10.1111/1755-0998.12592
144
LyeZ. N.PuruggananM. D. (2019). Copy number variation in domestication.Trends Plant Sci.24352–365. 10.1016/j.tplants.2019.01.003
145
MachajG.BostanH.Macko-PodgórniA.IorizzoM.GrzebelusD. (2018). Comparative transcriptomics of root development in wild and cultivated carrots.Genes (Basel).9:431. 10.3390/genes9090431
146
MaplesB. K.GravelS.KennyE. E.BustamanteC. D. (2013). RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference.Am. J. Hum. Genet.93278–288. 10.1016/j.ajhg.2013.06.020
147
MartinS. H.DaveyJ. W.JigginsC. D. (2015). Evaluating the use of ABBA–BABA statistics to locate introgressed loci.Mol. Biol. Evol.32244–257. 10.1093/molbev/msu269
148
MascherM.GundlachH.HimmelbachA.BeierS.TwardziokS. O.WickerT.et al (2017). A chromosome conformation capture ordered sequence of the barley genome.Nature544427–433. 10.1038/nature22043
149
MascherM.SchuenemannV. J.DavidovichU.MaromN.HimmelbachA.HübnerS.et al (2016). Genomic analysis of 6,000-year-old cultivated grain illuminates the domestication history of barley.Nat. Genet.481089–1093. 10.1038/ng.3611
150
Mastretta-YanesA.ArrigoN.AlvarezN.JorgensenT. H.PiñeroD.EmersonB. C. (2015). Restriction site-associated DNA sequencing, genotyping error estimation and de novo assembly optimization for population genetic inference.Mol. Ecol. Resour.1528–41. 10.1111/1755-0998.12291
151
MatherN.TravesS. M.HoS. Y. (2020). A practical introduction to sequentially Markovian coalescent methods for estimating demographic history from genomic data.Ecol. Evol.10579–589. 10.1002/ece3.5888
152
MathewB.LéonJ.SillanpääM. J. (2018). A novel linkage-disequilibrium corrected genomic relationship matrix for SNP-heritability estimation and genomic prediction.Heredity (Edinb).120356–368. 10.1038/s41437-017-0023-4
153
MazetO.RodríguezW.ChikhiL. (2015). Demographic inference using genetic data from a single individual: separating population size variation from population structure.Theor. Popul. Biol.10446–58. 10.1016/j.tpb.2015.06.003
154
MeirmansP. G. (2015). Seven common mistakes in population genetics and how to avoid them.Mol. Ecol.243223–3231. 10.1111/mec.13243
155
MeissnerA. (2005). Reduced representation bisulfite sequencing for comparative high-resolution DNA methylation analysis.Nucleic Acids Res.335868–5877. 10.1093/nar/gki901
156
MeyerR. S.ChoiJ. Y.SanchesM.PlessisA.FlowersJ. M.AmasJ.et al (2016). Domestication history and geographical adaptation inferred from a SNP map of African rice.Nat. Genet.481083–1088. 10.1038/ng.3633
157
MeyerR. S.PuruggananM. D. (2013). Evolution of crop species: genetics of domestication and diversification.Nat. Rev. Genet.14840–852. 10.1038/nrg3605
158
MoneyD.MigicovskyZ.GardnerK.MylesS. (2017). LinkImputeR: user-guided genotype calling and imputation for non-model organisms.BMC Genomics18:523. 10.1186/s12864-017-3873-5
159
MontenegroJ. D.GoliczA. A.BayerP. E.HurgobinB.LeeH.ChanC. K. K.et al (2017). The pangenome of hexaploid bread wheat.Plant Journal901007–1013. 10.1111/tpj.13515
160
MoyersB. T.MorrellP. L.McKayJ. K. (2018). Genetic costs of domestication and improvement.J. Hered.109103–116. 10.1093/jhered/esx069
161
MuirP.LiS.LouS.WangD.SpakowiczD. J.SalichosL.et al (2016). The real cost of sequencing: scaling computation to keep pace with data generation.Genome Biol.17:53. 10.1186/s13059-016-0917-0
162
MyersC. L.SpringerN. M.SchaeferR.Ross-IbarraJ.Swanson-WagnerR.TiffinP.et al (2012). Reshaping of the maize transcriptome by domestication.Proc. Natl. Acad. Sci. U.S.A.10911878–11883. 10.1073/pnas.1201961109
163
Nadachowska-BrzyskaK.BurriR.SmedsL.EllegrenH. (2016). PSMC analysis of effective population sizes in molecular ecology and its application to black-and-white Ficedula flycatchers.Mol. Ecol.251058–1072. 10.1111/mec.13540
164
NdjiondjopM. N.AlachiotisN.PavlidisP.GoungoulouA.KpekiS. B.ZhaoD.et al (2019). Comparisons of molecular diversity indices, selective sweeps and population structure of African rice with its wild progenitor and Asian rice.Theor. Appl. Genet.1321145–1158. 10.1007/s00122-018-3268-2
165
NeiM.MaruyamaT. (1975). Lewontin-Krakauer test for neutral genes.Genetics80:395.
- Google Scholar
166
NielsenR.BeaumontM. A. (2009). Statistical inferences in phylogeography.Mol. Ecol.181034–1047. 10.1111/j.1365-294X.2008.04059.x
167
NowoshilowS.SchloissnigS.FeiJ. F.DahlA.PangA. W.PippelM.et al (2018). The axolotl genome and the evolution of key tissue formation regulators.Nature55450–55. 10.1038/nature25458
168
OttoniC.Van NeerW.De CupereB.DaligaultJ.GuimaraesS.PetersJ.et al (2017). The palaeogenetics of cat dispersal in the ancient world.Nat. Ecol. Evol.1:0139. 10.1038/s41559-017-0139
- CrossRef
- Google Scholar
169
PankinA.AltmüllerJ.BeckerC.von KorffM. (2018). Targeted resequencing reveals genomic signatures of barley domestication.New Phytol.2181247–1259. 10.1111/nph.15077
170
PatersonA. H.LanderE. S.HewittJ. D.PetersonS.LincolnS. E.TanksleyS. D. (1988). Resolution of quantitative traits into Mendelian factors by using a complete linkage map of restriction fragment length polymorphisms.Nature335721–726. 10.1038/335721a0
171
PattersonN.PriceA. L.ReichD. (2006). Population structure and eigenanalysis.PLoS Genet.2:e190. 10.1371/journal.pgen.0020190
172
PavlidisP.AlachiotisN. (2017). A survey of methods and tools to detect recent and strong positive selection.J. Biol. Res.24:7. 10.1186/s40709-017-0064-0
173
PickrellJ. K.PritchardJ. K. (2012). Inference of population splits and mixtures from genome-wide allele frequency data.PLoS Genet.8:e1002967. 10.1371/journal.pgen.1002967
174
PipernoD. R. (2017). Assessing elements of an extended evolutionary synthesis for plant domestication and agricultural origin research.Proc. Natl. Acad. Sci. U.S.A.1146429–6437. 10.1073/pnas.1703658114
175
PittD.SevaneN.NicolazziE. L.MacHughD. E.ParkS. D. E.ColliL.et al (2019). Domestication of cattle: two or three events?Evol. Appl.12123–136. 10.1111/eva.12674
176
PrezeworskiM.CoopG.WallJ. D. (2005). The signature of positive selection on standing genetic variation.Evolution592312–2323. 10.1111/j.0014-3820.2005.tb00941.x
- CrossRef
- Google Scholar
177
PriceA. L.TandonA.PattersonN.BarnesK. C.RafaelsN.RuczinskiI.et al (2009). Sensitive detection of chromosomal segments of distinct ancestry in admixed populations.PLoS Genet.5:e1000519. 10.1371/journal.pgen.1000519
178
PritchardJ. K.StephensM.DonnellyP. (2000). Inference of population structure using multilocus genotype data.Genetics155945–959. 10.1111/j.1471-8286.2007.01758.x
179
QiJ.LiuX.ShenD.MiaoH.XieB.LiX.et al (2013). A genomic variation map provides insights into the genetic basis of cucumber domestication and diversity.Nat. Genet.451510–1515. 10.1038/ng.2801
180
QiuQ.WangL.WangK.YangY.MaT.WangZ.et al (2015). Yak whole-genome resequencing reveals domestication signatures and prehistoric population expansions.Nat. Commun.6:10283. 10.1038/ncomms10283
181
RajA.StephensM.PritchardJ. K. (2014). fastSTRUCTURE: variational inference of population structure in large SNP data sets.Genetics197573–589. 10.1534/genetics.114.164350
182
Ramos-MadrigalJ.SmithB. D.Moreno-MayarJ. V.GopalakrishnanS.Ross-IbarraJ.GilbertM. T. P.et al (2016). Genome sequence of a 5,310-year-old maize cob provides insights into the early stages of maize domestication.Curr. Biol.263195–3201. 10.1016/j.cub.2016.09.036
183
RanwezV.SerraA.PotD.ChantretN. (2017). Domestication reduces alternative splicing expression variations in sorghum.PLoS One12:e0183454. 10.1371/journal.pone.0183454
184
RenautS.RiesebergL. H. (2015). The accumulation of deleterious mutations as a consequence of domestication and improvement in sunflowers and other compositae crops.Mol. Biol. Evol.322273–2283. 10.1093/molbev/msv106
185
Rodríguez-MegaE.Piñeyro-NelsonA.GutierrezC.García-PonceB.SánchezM. D. L. P.Zluhan-MartínezE.et al (2015). Role of transcriptional regulation in the evolution of plant phenotype: a dynamic systems approach.Dev. Dyn.2441074–1095. 10.1002/dvdy.24268
186
Ross-IbarraJ.MorrellP. L.GautB. S. (2007). Plant domestication, a unique opportunity to identify the genetic basis of adaptation.Proc. Natl. Acad. Sci. U.S.A.1048641–8648. 10.1073/pnas.0700643104
187
RozasJ.Ferrer-MataA.Sánchez-DelBarrioJ. C.Guirao-RicoS.LibradoP.Ramos-OnsinsS. E.et al (2017). DnaSP 6: DNA sequence polymorphism analysis of large data sets.Mol. Biol. Evol.343299–3302. 10.1093/molbev/msx248
188
SabetiP. C.ReichD. E.HigginsJ. M.LevineH. Z. P.RichterD. J.SchaffnerS. F.et al (2002). Detecting recent positive selection in the human genome from haplotype structure.Nature419832–837. 10.1038/nature01140
189
SabetiP. C.VarillyP.FryB.LohmuellerJ.HostetterE.CotsapasC.et al (2007). Genome-wide detection and characterization of positive selection in human populations.Nature449913–918. 10.1038/nature06250
190
SakuradaK. (2010). Environmental epigenetic modifications and reprogramming-recalcitrant genes.Stem Cell Res.4157–164. 10.1016/j.scr.2010.01.001
191
SawyerS.KrauseJ.GuschanskiK.SavolainenV.PääboS. (2012). Temporal patterns of nucleotide misincorporations and DNA fragmentation in ancient DNA.PLoS One7:e34131. 10.1371/journal.pone.0034131
192
SaxK. (1923). The association of size differences with seed-coat pattern and pigmentation in Phaseolus vulgaris.Genetics8552–560.
- Google Scholar
193
SchiffelsS.DurbinR. (2014). Inferring human population size and separation history from multiple genome sequences.Nat. Genet.46919–925. 10.1038/ng.3015
194
SchlöttererC.ToblerR.KoflerR.NolteV. (2014). Sequencing pools of individuals – mining genome-wide polymorphism data without big funding.Nat. Rev. Genet.15749–763. 10.1038/nrg3803
195
SchmidtD.WilsonM. D.SpyrouC.BrownG. D.HadfieldJ.OdomD. T. (2009). ChIP-seq: using high-throughput sequencing to discover protein–DNA interactions.Methods48240–248. 10.1016/j.ymeth.2009.03.001
196
SchmitzR. J.SchultzM. D.UrichM. A.NeryJ. R.PelizzolaM.LibigerO.et al (2013). Patterns of population epigenomic diversity.Nature495193–198. 10.1038/nature11968
197
SchmutzJ.McCleanP. E.MamidiS.WuG. A.CannonS. B.GrimwoodJ.et al (2014). A reference genome for common bean and genome-wide analysis of dual domestications.Nat. Genet.46707–713. 10.1038/ng.3008
198
SchnableP. S.WareD.FultonR. S.SteinJ. C.WeiF.PasternakS.et al (2009). The B73 maize genome: complexity, diversity, and dynamics.Science3261112–1115. 10.1126/science.1178534
199
SchreiberM.SteinN.MascherM. (2018). Genomic approaches for studying crop evolution.Genome Biol.19:140. 10.1186/s13059-018-1528-8
200
SchubertM.JónssonH.ChangD.Der SarkissianC.ErminiL.GinolhacA.et al (2014). Prehistoric genomes reveal the genetic foundation and cost of horse domestication.Proc. Natl. Acad. Sci. U.S.A.111E5661–E5669. 10.1073/pnas.1416991111
201
SchurchN. J.SchofieldP.GierliñskiM.ColeC.SherstnevA.SinghV.et al (2016). How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use?RNA22839–851. 10.1261/rna.053959.115
202
SchweigerW.BodduJ.ShinS.PoppenbergerB.BerthillerF.LemmensM.et al (2010). Validation of a candidate deoxynivalenol-inactivating UDP-glucosyltransferase from barley by heterologous expression in yeast.Mol. Plant Microbe Interact.23977–986. 10.1094/MPMI-23-7-0977
203
SealA.GuptaA.MahalaxmiM.AykkalR.SinghT. R.ArunachalamV. (2014). Tools, resources and databases for SNPs and indels in sequences: a review.Int. J. Bioinform. Res. Appl.10:264. 10.1504/IJBRA.2014.060762
204
ShaferA. B. A.PeartC. R.TussoS.MaayanI.BrelsfordA.WheatC. W.et al (2017). Bioinformatic processing of RAD-seq data dramatically impacts downstream population genetic inference.Methods Ecol. Evol.8907–917. 10.1111/2041-210X.12700
- CrossRef
- Google Scholar
205
ShalemO.SanjanaN. E.HartenianE.ShiX.ScottD. A.MikkelsenT. S.et al (2014). Genome-scale CRISPR-Cas9 knockout screening in human cells.Science34384–87. 10.1126/science.1247005
206
ShanS.SoltisP. S.SoltisD. E.YangB. (2020). Considerations in adapting CRISPR/Cas9 in nongenetic model plant systems.Appl. Plant Sci.8:e11314. 10.1002/aps3.11314
207
ShangY.MaY.ZhouY.ZhangH.DuanL.ChenH.et al (2014). Biosynthesis, regulation, and domestication of bitterness in cucumber.Science3461084–1088. 10.1126/science.1259215
208
ShenY.ZhangJ.LiuY.LiuS.LiuZ.DuanZ.et al (2018). DNA methylation footprints during soybean domestication and improvement.Genome Biol.19:128. 10.1186/s13059-018-1516-z
209
ShiJ.LaiJ. (2015). Patterns of genomic changes with crop domestication and breeding.Curr. Opin. Plant Biol.24C47–53. 10.1016/j.pbi.2015.01.008
210
SimsD.SudberyI.IlottN. E.HegerA.PontingC. P. (2014). Sequencing depth and coverage: key considerations in genomic analyses.Nat. Rev. Genet.15121–132. 10.1038/nrg3642
211
SinghJ.ZhaoJ.VallejosC. E. (2018). Differential transcriptome patterns associated with early seedling development in a wild and a domesticated common bean (Phaseolus vulgaris L.) accession.Plant Sci.274153–162. 10.1016/j.plantsci.2018.05.024
212
SmithJ. M.HaighJ. (2007). The hitch-hiking effect of a favourable gene.Genet. Res. (Camb)89391–403. 10.1017/S0016672308009579
213
SmithO.NicholsonW. V.KistlerL.MaceE.ClaphamA.RoseP.et al (2019). A domestication history of dynamic adaptation and genomic deterioration in Sorghum.Nat. Plants5369–379. 10.1038/s41477-019-0397-9
214
SohnJ.NamJ.-W. (2016). The present and future of de novo whole-genome assembly.Brief. Bioinform.19:bbw096. 10.1093/bib/bbw096
215
SongJ.LiJ.SunJ.HuT.WuA.LiuS.et al (2018). Genome-wide association mapping for cold tolerance in a core collection of rice (Oryza sativa L.) landraces by using high-density single nucleotide polymorphism markers from specific-locus amplified fragment sequencing.Front. Plant Sci.9:875. 10.3389/fpls.2018.00875
216
SongQ.ZhangT.StellyD. M.ChenZ. J. (2017). Epigenomic and functional analyses reveal roles of epialleles in the loss of photoperiod sensitivity during domestication of allotetraploid cottons.Genome Biol.18:99. 10.1186/s13059-017-1229-8
217
SoykS.LemmonZ. H.OvedM.FisherJ.LiberatoreK. L.ParkS. J.et al (2017). Bypassing negative epistasis on yield in tomato imposed by a domestication gene.Cell1691142–1155. 10.1016/j.cell.2017.04.032
218
SunH.WuS.ZhangG.JiaoC.GuoS.RenY.et al (2017). Karyotype stability and unbiased fractionation in the paleo-allotetraploid cucurbita genomes.Mol. Plant101293–1306. 10.1016/j.molp.2017.09.003
219
SwinnenG.GoossensA.PauwelsL. (2016). Lessons from domestication: targeting Cis -regulatory elements for crop improvement.Trends Plant Sci.21506–515. 10.1016/j.tplants.2016.01.014
220
TajimaF. (1989). Statistical method for testing the neutral mutation hypothesis by DNA polymorphism.Genetics123585–595.
- Google Scholar
221
TangH.CoramM.WangP.ZhuX.RischN. (2006). Reconstructing genetic ancestry blocks in admixed individuals.Am. J. Hum. Genet.791–12. 10.1086/504302
222
TettelinH.MasignaniV.CieslewiczM. J.DonatiC.MediniD.WardN. L.et al (2005). Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome.”.Proc. Natl. Acad. Sci. U.S.A.10213950–13955. 10.1073/pnas.0506758102
223
ThorntonT. A.BermejoJ. L. (2014). Local and global ancestry inference and applications to genetic association analysis for admixed populations.Genet. Epidemiol.38S5–S12. 10.1002/gepi.21819
224
TiffinP.Ross-IbarraJ. (2014). Advances and limits of using population genetics to understand local adaptation.Trends Ecol. Evol.29673–680. 10.1016/j.tree.2014.10.004
225
Tomato Genome Consortium (2012). The tomato genome sequence provides insights into fleshy fruit evolution.Nature485:635. 10.1038/nature11119
226
TrerotolaM.RelliV.SimeoneP.AlbertiS. (2015). Epigenetic inheritance and the missing heritability.Hum. Genomics9:17. 10.1186/s40246-015-0041-3
227
TrucchiE.GrattonP.WhittingtonJ. D.CristofariR.Le MahoY.StensethN. C.et al (2014). King penguin demography since the last glaciation inferred from genome-wide data.Proc. R. Soc. B Biol. Sci.281:20140528. 10.1098/rspb.2014.0528
228
TurnerB. M. (2009). Epigenetic responses to environmental change and their evolutionary implications.Philos. Trans. R. Soc. Lond. B Biol. Sci.3643403–3418. 10.1098/rstb.2009.0125
229
UetaR.AbeC.WatanabeT.SuganoS. S.IshiharaR.EzuraH.et al (2017). Rapid breeding of parthenocarpic tomato plants using CRISPR/Cas9.Sci. Rep.7:507. 10.1038/s41598-017-00501-4
230
UrichM. A.NeryJ. R.ListerR.SchmitzR. J.EckerJ. R. (2015). MethylC-seq library preparation for base-resolution whole-genome bisulfite sequencing.Nat. Protoc.10:475. 10.1038/nprot.2014.114
231
Vallebueno-EstradaM.Rodríguez-ArévaloI.Rougon-CardosoA.Martínez GonzálezJ.García CookA.MontielR.et al (2016). The earliest maize from San Marcos Tehuacán is a partial domesticate with genomic evidence of inbreeding.Proc. Natl. Acad. Sci. U.S.A.11314151–14156. 10.1073/pnas.1609701113
232
VanBurenR.WaiC. M.ColleM.WangJ.SullivanS.BushakraJ. M.et al (2018). A near complete, chromosome-scale assembly of the black raspberry (Rubus occidentalis) genome.Gigascience71–9. 10.1093/gigascience/giy094
233
VarshneyR. K.ThudiM.RoorkiwalM.HeW.UpadhyayaH. D.YangW.et al (2019). Resequencing of 429 chickpea accessions from 45 countries provides insights into genome diversity, domestication and agronomic traits.Nat. Genet.51857–864. 10.1038/s41588-019-0401-3
234
VasemägiA.NilssonJ.McGinnityP.CrossT.O’ReillyP.GlebeB.et al (2012). Screen for footprints of selection during domestication/captive breeding of atlantic salmon.Comp. Funct. Genomics20121–14. 10.1155/2012/628204
235
VelascoD.HoughJ.AradhyaM.Ross-IbarraJ. (2016). Evolutionary genomics of peach and almond domestication.G363985–3993. 10.1534/g3.116.032672
236
VisscherP. M.WrayN. R.ZhangQ.SklarP.McCarthyM. I.BrownM. A.et al (2017). 10 years of GWAS discovery: biology, function, and translation.Am. J. Hum. Genet.1015–22. 10.1016/j.ajhg.2017.06.005
237
VittiJ. J.GrossmanS. R.SabetiP. C. (2013). Detecting natural selection in genomic data.Annu. Rev. Genet.4797–120. 10.1146/annurev-genet-111212-133526
238
VoightB. F.KudaravalliS.WenX.PritchardJ. K. (2006). A map of recent positive selection in the human genome.PLoS Biol.4:e72. 10.1371/journal.pbio.0040072
239
WalesN.AkmanM.WatsonR. H. B.Sánchez BarreiroF.SmithB. D.GremillionK. J.et al (2019). Ancient DNA reveals the timing and persistence of organellar genetic bottlenecks over 3,000 years of sunflower domestication and improvement.Evol. Appl.1238–53. 10.1111/eva.12594
240
WalesN.Ramos MadrigalJ.CappelliniE.Carmona BaezA.Samaniego CastruitaJ. A.Romero-NavarroJ. A.et al (2016). The limits and potential of paleogenomic techniques for reconstructing grapevine domestication.J. Archaeol. Sci.7257–70. 10.1016/j.jas.2016.05.014
- CrossRef
- Google Scholar
241
WalleyJ. W.SartorR. C.ShenZ.SchmitzR. J.WuK. J.UrichM. A.et al (2016). Integration of omic networks in a developmental atlas of maize.Science353814–818. 10.1126/science.aag1125
242
WangE. T.KodamaG.BaldiP.MoyzisR. K. (2006). Global landscape of recent inferred Darwinian selection for Homo sapiens.Proc. Natl. Acad. Sci. U.S.A.103135–140. 10.1073/pnas.0509691102
243
WangG.-D.XieH.-B.PengM.-S.IrwinD.ZhangY.-P. (2014). Domestication genomics: evidence from animals.Annu. Rev. Anim. Biosci.265–84. 10.1146/annurev-animal-022513-114129
244
WangM.YanJ.ZhaoJ.SongW.ZhangX.XiaoY.et al (2012). Genome-wide association study (GWAS) of resistance to head smut in maize.Plant Sci.196125–131. 10.1016/j.plantsci.2012.08.004
245
WangW.FengB.XiaoJ.XiaZ.ZhouX.LiP.et al (2014). Cassava genome from a wild ancestor to cultivated varieties.Nat. Commun.5:5110. 10.1038/ncomms6110
246
WangW.ZhangX.ZhouX.ZhangY.LaY.ZhangY.et al (2019). Deep genome resequencing reveals artificial and natural selection for visual deterioration, plateau adaptability and high prolificacy in chinese domestic sheep.Front. Genet.10:300. 10.3389/fgene.2019.00300
247
WarrA.RobertC.HumeD.ArchibaldA.DeebN.WatsonM. (2015). Exome sequencing: current and future perspectives.G351543–1550. 10.1534/g3.115.018564
248
WeberM.DaviesJ. J.WittigD.OakeleyE. J.HaaseM.LamW. L.et al (2005). Chromosome-wide and promoter-specific analyses identify sites of differential DNA methylation in normal and transformed human cells.Nat. Genet.37853–862. 10.1038/ng1598
249
WeigelD.ColotV. (2012). Epialleles in plant evolution.Genome Biol.13249. 10.1186/gb-2012-13-10-249
250
WienerP.Pong-WongR. (2011). A regression-based approach to selection mapping.J. Hered.102294–305. 10.1093/jhered/esr014
251
WiggintonJ. E.CutlerD. J.AbecasisG. R. (2005). A note on exact tests of hardy-weinberg equilibrium.Am. J. Hum. Genet.76887–893. 10.1086/429864
252
WilkinsonS.LuZ. H.MegensH.-J.ArchibaldA. L.HaleyC.JacksonI. J.et al (2013). Signatures of diversifying selection in european pig breeds.PLoS Genet.9:e1003453. 10.1371/journal.pgen.1003453
253
WolterF.SchindeleP.PuchtaH. (2019). Plant breeding at the speed of light: the power of CRISPR/Cas to generate directed genetic diversity at multiple sites.BMC Plant Biol.19:176. 10.1186/s12870-019-1775-1
254
WuD.-D.DingX.-D.WangS.WójcikJ. M.ZhangY.TokarskaM.et al (2018). Pervasive introgression facilitated domestication and adaptation in the Bos species complex.Nat. Ecol. Evol.21139–1145. 10.1038/s41559-018-0562-y
255
XieM.ChungC. Y.-L.LiM.WongF.-L.WangX.LiuA.et al (2019). A reference-grade wild soybean genome.Nat. Commun.10:1216. 10.1038/s41467-019-09142-9
256
YandellM.EnceD. (2012). A beginner’s guide to eukaryotic genome annotation.Nat. Rev. Genet.13329–342. 10.1038/nrg3174
257
YangC. J.SamayoaL. F.BradburyP. J.OlukoluB. A.XueW.YorkA. M.et al (2019). The genetic architecture of teosinte catalyzed and constrained maize domestication.Proc. Natl. Acad. Sci. U.S.A.1165643–5652. 10.1073/pnas.1820997116
258
YangI. S.KimS. (2015). Analysis of whole transcriptome sequencing data: workflow and software.Genomics Inform.13:119. 10.5808/GI.2015.13.4.119
259
YangL.KooD. H.LiY.ZhangX.LuanF.HaveyM. J.et al (2012). Chromosome rearrangements during domestication of cucumber as revealed by high-density genetic mapping and draft genome assembly.Plant J.71895–906. 10.1111/j.1365-313X.2012.05017.x
260
YangX.ZhangH.ShangJ.LiuG.XiaT.ZhaoC.et al (2018). Comparative analysis of the blood transcriptomes between wolves and dogs.Anim. Genet.49291–302. 10.1111/age.12675
261
ZadesenetsK. S.RubtsovN. B. (2018). Genome duplication in animal evolution.Russ. J. Genet.541125–1136. 10.1134/S1022795418090168
- CrossRef
- Google Scholar
262
ZederM. A. (2006). Central questions in the domestication of plants and animals.Evol. Anthropol. Issues News Rev.15105–117. 10.1002/evan.20101
- CrossRef
- Google Scholar
263
ZederM. A.EmshwillerE.SmithB. D.BradleyD. G. (2006). Documenting domestication: the intersection of genetics and archaeology.Trends Genet.22139–155. 10.1016/j.tig.2006.01.007
264
ZengK.FuY.-X.ShiS.WuC.-I. (2006). Statistical tests for detecting positive selection by utilizing high-frequency variants.Genetics1741431–1439. 10.1534/genetics.106.061432
265
ZengL.TuX.-L.DaiH.HanF.-M.LuB.-S.WangM.-S.et al (2019). Whole genomes and transcriptomes reveal adaptation and domestication of pistachio.Genome Biol.20:79. 10.1186/s13059-019-1686-3
266
ZhangC.BaileyD. K.AwadT.LiuG.XingG.CaoM.et al (2006). A whole genome long-range haplotype (WGLRH) test for detecting imprints of positive selection in human populations.Bioinformatics222122–2128. 10.1093/bioinformatics/btl365
267
ZhangH.ZhangJ.LangZ.BotellaJ. R.ZhuJ.-K. (2017). Genome editing—principles and applications for functional genomics research and crop improvement.CRC. Crit. Rev. Plant Sci.36291–309. 10.1080/07352689.2017.1402989
- CrossRef
- Google Scholar
268
ZhaoB. S.RoundtreeI. A.HeC. (2017). Post-transcriptional gene regulation by mRNA modifications.Nat. Rev. Mol. Cell. Biol.1831–42. 10.1038/nrm.2016.132
269
ZhaoQ.FengQ.LuH.LiY.WangA.TianQ.et al (2018). Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice.Nat. Genet.50278–284. 10.1038/s41588-018-0041-z
270
ZhengJ.WuH.ZhuH.HuangC.LiuC.ChangY.et al (2019). Determining factors, regulation system, and domestication of anthocyanin biosynthesis in rice leaves.New Phytol.223705–721. 10.1111/nph.15807
271
ZhouJ.LiD.WangG.WangF.KunjalM.JoldersmaD.et al (2019). Application and future perspective of CRISPR/Cas9 genome editing in fruit crops.J. Integr. Plant Biol.62269–286. 10.1111/jipb.12793
272
ZhouY.MinioA.MassonnetM.SolaresE.LvY.BeridzeT.et al (2019). The population genetics of structural variants in grapevine domestication.Nat. Plants5965–979. 10.1038/s41477-019-0507-8

Summary

Keywords

population genomics, pangenomics, ancient DNA, differential expression analysis, epialleles, genome editing

Citation

Barrera-Redondo J, Piñero D and Eguiarte LE (2020) Genomic, Transcriptomic and Epigenomic Tools to Study the Domestication of Plants and Animals: A Field Guide for Beginners. Front. Genet. 11:742. doi: 10.3389/fgene.2020.00742

Received

25 April 2020

Accepted

22 June 2020

Published

15 July 2020

Volume

11 - 2020

Edited by

TingFung Chan, The Chinese University of Hong Kong, China

Reviewed by

Eric Von Wettberg, The University of Vermont, United States; Martin Mascher, Leibniz Institute for Plant Genetics and Cultural Plant Research (IPK), Germany; David Irwin, University of Toronto, Canada

Updates

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Luis E. Eguiarte, fruns@unam.mx

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

REVIEW article

Genomic, Transcriptomic and Epigenomic Tools to Study the Domestication of Plants and Animals: A Field Guide for Beginners

Abstract

Introduction

Why and How to Use a Genomic Approach in Domestication Studies? Top-Down and Bottom-Up Approaches for the Study of Domestication

Whole-Genome Assembly and Reference Genomes

Strategies to Gather Adequate Population Genomics Data

Whole-Genome Sequencing of Populations

Alternatives to Whole-Genome Sequencing

Pool Sequencing

Exome Capture and Sequencing

RNA Sequencing of Populations

Restriction Site-Associated DNA Sequencing

Pan-Genome Analyses in Domesticated and Wild Taxa

Population Genetics and Demographic Analyses of the Domestication Process

Genetic Diversity in Populations

Population Structure

Inferences in Changes of Population Sizes Throughout Time

Estimating Gene Flow and Introgression Between Populations

Using Demographic Simulations to Infer Domestication Scenarios

Identifying Genes Under Selection During Domestication

FST Outlier Tests to Detect Candidate Genes

Site Frequency Spectrum Based Tests to Detect Selective Sweeps

Linkage Disequilibrium (LD) Based Methods to Detect Selection

Using GWAS to Detect Domestication-Associated Loci

Ancient DNA and Paleogenomics of Domesticated Taxa

Extraction and Sequencing of Ancient DNA

Insights of Paleogenomic Data in Domestication

RNA Sequencing to Detect Differentially Expressed Genes Associated to Domestication

The Experimental Design of Differential Expression Analyses

Successful Examples of RNA-seq Experiments to Understand Domestication

Modern Epigenomics and Methodological Strategies to Explore Domestication

Obtaining Population Data From Epigenetic Marks

Epigenomic Studies Applied to Understand Domestication

Experimental Validation of Candidate Genes

Conclusion and Perspectives

Statements

Author contributions

Funding

Acknowledgments

Conflict of interest

Supplementary material

References

Summary

Outline

Figures

Cite article

Share article

Article metrics

F_ST Outlier Tests to Detect Candidate Genes