High-Density Genetic Linkage Mapping of Lepidium Based on Genotyping-by-Sequencing SNPs and Segregating Contig Tag Haplotypes

Lepidium campestre has been targeted for domestication as future oilseed and catch crop. Three hundred eighty plants comprising genotypes of L. campestre, Lepidium heterophyllum, and their interspecific F2 mapping population were genotyped using genotyping by sequencing (GBS), and the generated polymorphic markers were used for the construction of high-density genetic linkage map. TASSEL-GBS, a reference genome-based pipeline, was used for this analysis using a draft L. campestre whole genome sequence. The analysis resulted in 120,438 biallelic single-nucleotide polymorphisms (SNPs) with minor allele frequency (MAF) above 0.01. The construction of genetic linkage map was conducted using MSTMap based on phased SNPs segregating in 1:2:1 ratio for the F2 individuals, followed by genetic mapping of segregating contig tag haplotypes as dominant markers against the linkage map. The final linkage map consisted of eight linkage groups (LGs) containing 2,330 SNP markers and spanned 881 Kosambi cM. Contigs (10,302) were genetically mapped to the eight LGs, which were assembled into pseudomolecules that covered a total of ∼120.6 Mbp. The final size of the pseudomolecules ranged from 9.4 Mbp (LG-4) to 20.4 Mpb (LG-7). The following major correspondence between the eight Lepidium LGs (LG-1 to LG-8) and the five Arabidopsis thaliana (At) chromosomes (Atx-1–Atx-5) was revealed through comparative genomics analysis: LG-1&2_Atx-1, LG-3_Atx-2&3, LG-4_Atx-2, LG-5_Atx-2&Atx-3, LG-6_Atx-4&5, LG-7_Atx-4, and LG-8_Atx-5. This analysis revealed that at least 66% of the sequences of the LGs showed high collinearity with At chromosomes. The sequence identity between the corresponding regions of the LGs and At chromosomes ranged from 80.6% (LG-6) to 86.4% (LG-8) with overall mean of 82.9%. The map positions on Lepidium LGs of the homologs of 24 genes that regulate various traits in A. thaliana were also identified. The eight LGs revealed in this study confirm the previously reported (1) haploid chromosome number of eight in L. campestre and L. heterophyllum and (2) chromosomal fusion, translocation, and inversion events during the evolution of n = 8 karyotype in ancestral species shared by Lepidium and Arabidopsis to n = 5 karyotype in A. thaliana. This study generated highly useful genomic tools and resources for Lepidium that can be used to accelerate its domestication.


INTRODUCTION
Today's major crop species are the results of thousands of years of intentional and unintentional selection of traits that brought genetically determined changes in the ancestral wild plant species (Burger et al., 2008). Domestication of a crop species is generally a very slow and long-term process that leads to significant changes in major traits that are regarded as "domestication syndrome" traits, such as determinate growth habit, increased seed size, loss of seed dormancy, and reduced pod shattering (Harlan et al., 1973;Doebley et al., 2006;Weeden, 2007;Burger et al., 2008). However, progress in genomic research that include comparative genomics, gene identification, annotation of whole genome sequences (WGS), development of genomewide molecular markers, and genome-wide association studies (GWAS) for various crops led to deep insight into the process of plant domestication and evolution (Geleta and Ortiz, 2016). The use of genomic tools and resources in combination with conventional plant breeding methods is becoming essential in the development of new crop cultivars in a relatively shorter time than before (Pérez-de-Castro et al., 2012). Genomic tools and resources, such as a variety of molecular markers, high-density genetic linkage, quantitative trait locus (QTL), and genomewide association maps are becoming the cornerstone of plant breeding, as they facilitate marker-assisted and genomic selection (Collard and Mackill, 2008;Lorenz et al., 2012;Lopez-Cruz et al., 2015;Qu et al., 2017;Koech et al., 2019). Through the use of these tools and resources and the analyses of the genetics of "domestication syndrome" traits, a good insight into the evolutionary changes that have occurred during plant domestication have been gained and can be used to facilitate a rapid domestication of new plant species.
Lepidium L. is a large, undomesticated genus in the Brassicaceae family that comprises 231 species distributed around the world (Al-Shehbaz et al., 2006). Indigenous Lepidium species are found on all continents (Al-Shehbaz, 1986), and they often grow in habitats with less competition for resources and space, such as roadsides, railway sides, and disturbed areas. L. campestre (L.) R. Br. (field cress), an annual or biennial (Mulligan, 1961) diploid species with 2n = 2x = 16 chromosomes (Rice et al., 2015), has wide distribution in Europe including Nordic countries. Based on its various desirable characteristics including winter hardiness, promising potential for high seed yield, selfcompatibility, synchronous seed maturity, and suitability as an undersown catch crop (Al-Shehbaz, 1986;Merker and Nilsson, 1995;Eriksson, 2009;Geleta et al., 2014), it has been considered for domestication as a future oilseed and catch crop in Sweden to contribute to increased global production of vegetable oil as well as diversification of agroecosystems. Although the oil content of L. campestre was initially reported to be ∼20% (Nilsson et al., 1998), further studies of a wide collection of European and North American L. campestre accessions showed that the oil content varies from 12 to 20% (Mulatu Geleta et al., SLU, unpublished data). Its seed oil is mainly composed of linolenic acid (C18:3; 34-39%), erucic acid (C22:1; 22-25%), oleic acid (C18:1; 12-16%), and linoleic acid (C18:2; 8-11%).
Research and breeding activities of varying intensity have been ongoing during the last 25 years at the Swedish University of Agricultural Sciences (SLU), contributing to the domestication of L. campestre (Merker and Nilsson, 1995;Andersson et al., 1999;Börjesdotter, 1999;Eriksson, 2009;Merker et al., 2010;Ivarson et al., 2013;Geleta et al., 2014;Ivarson et al., 2016;Gustafsson et al., 2018). Domestication of L. campestre should result in significant improvement in various major traits, such as oil content and quality, pod shatter resistance, and seed yield (Eriksson, 2009;Geleta et al., 2014). For use as edible oil, antinutritional compounds, such as glucosinolates and erucic acid, will have to be eliminated or drastically reduced through breeding (Andersson et al., 1999;Ivarson et al., 2016).
Given that perenniality is a favorable trait in catch crops, the overall goal of the domestication of L. campestre is to develop both biennial and perennial cultivars. However, developing perennial L. campestre requires its interspecific hybridization with perennial Lepidium species, such as L. heterophyllum Benth. and Lepidium hirtum (L.) Sm. (Mulligan, 1961;Mummenhoff et al., 1995). Both species are closely related to L. campestre (Mummenhoff et al., 2001;Lee et al., 2002) and share the same ploidy level and chromosome number (2n = 2x = 16) (Rice et al., 2015). The interspecific hybridization of L. campestre with these two species was successful, and perennial breeding lines derived from these hybrids have been developed (Mulatu Geleta et al., SLU, unpublished data). In addition to the perenniality trait, hybridization between these species led to a larger variation in various desirable traits as compared to the variation within either of the parental species, providing wider opportunities for further breeding. This study was conducted to develop genomic tools and resources for Lepidium in order to understand its genome as well as accelerate its domestication.
were genotypes collected from Arlid (unknown exact location) and Stuvsta (59 • 15 25 N, 17 • 58 51 E), Sweden, respectively. C92_2_3 was a genotype sampled from IPK (Germany) accession LEP-92 originally collected from Greece. Par_2 was a genotype that belongs to a US Department of Agriculture (USDA)-Agricultural Research Service (ARS) accession LH 597856 originally collected from Spain, whereas Hast_3 is a genotype collected from Hästveda, Sweden (56 • 17 21 N, 13 • 56 09 E). Genomic data from the mapping population and the two parents were used for genetic linkage mapping and various statistical analyses. The two L. campestre and one L. heterophyllum genotypes were included to estimate various genetic diversity parameters within and among the two Lepidium species.

DNA Extraction
Seeds from target samples were planted in a greenhouse at the Department of Plant Breeding, SLU, Alnarp, Sweden. Young leaf tissue was sampled in Eppendorf tubes from individual plants and immediately frozen in liquid nitrogen, and genomic DNA was extracted as described in Gustafsson et al. (2018). The quality and quantity of the extracted DNA was assessed using 1% agarose gel electrophoresis and a NanoDrop R ND-1000 Spectrophotometer (Saveen Warner, Sweden). The extracted DNA samples were then sent to the Genomic Diversity Facility, Cornell University, USA, for genotyping by sequencing (GBS).

GBS Optimization and Analysis
A number of restriction enzymes were tested to determine the best enzyme that produces fragment size distribution suitable for the construction of GBS library. At the end, ApeKI (G * CWGC), a 4-base cutter enzyme was selected, as majority of fragments produced were < 500 bp and hence were appropriate for Illumina sequencing. The reference genome-based pipeline TASSEL-GBS (Glaubitz et al., 2014) was used for this GBS analysis, where a draft whole genome sequence of L. campestre assembled inhouse was used as a reference genome. In this GBS analysis, a total of 7,591,461 tags were generated after merging, and these tags were aligned against the in-house assembled L. campestre genome. Of these tags (3,688,674 tags), 48.6% were uniquely aligned to the reference genome, whereas 8.8 and 42.6% were multiply aligned and unaligned, respectively. SNP calling after aligning the tags to the reference genome resulted in 165,892 SNPs. VCFtools version v0.1.12a (Danecek et al., 2011) was used to calculate heterozygosity as well as the depth and missingness of the resultant SNPs. Filtering out of SNPs with minor allele frequency (MAF) below 0.01 and missing data per site above 90% resulted in 126,859 SNPs, of which 120,438 were biallelic. PLINK version v1.07 was used to generate a multidimension scaling (MDS) plot based on the 120,438 genome-wide biallelic SNPs.

Linkage Map Construction With GBS SNPs
The 126,859 GBS SNP markers were processed using VCFtools version 0.1.15 (Danecek et al., 2011) in the following order: (1) only SNPs with MAF of at least 40% were retained; (2) genotypes supported by a read depth of less than seven were set to missing; (3) SNPs with more than 10% missing data were discarded; (4) SNPs deviating from 1:2:1 segregation with p < 0.01 were discarded; (5) the SNPs were thinned so that no two SNPs were <65 bases apart (i.e., only one SNP was retained per 64base GBS tag locus); and (6) the genotypes were converted to a numerical format to facilitate further processing in R (R Core Team, 2016). The genotypes were then phased based upon the two parents, and the resultant 2,352 phased SNPs for the 375 F 2 individuals were reformatted for input into MSTMap (Wu et al., 2008) with a custom R script. The parameters used for linkage map construction with MSTMap include Kosambi distance with cutoff P-value of 1e −12 , number of map distance of 10, number of map size of 1, and missing threshold of 0.15.
The resulting linkage map of eight linkage groups was reformatted for loading into R with a custom awk command, and then, a custom R script was used to merge it with the genotypic data and count the number of crossovers per individual across the eight linkage groups. Nine outlier individuals with more than 66 crossovers were discarded from further analysis, and a new linkage map was constructed with MSTMap using the remaining 366 individuals (same parameters as above). Twentyone outlier SNPs with more than 15 double crossovers were then identified using R and excluded, and a final linkage map was constructed with MSTMap for the remaining 2,331 SNPs (again with the same parameters). The R package R/qtl (Broman et al., 2003) was then used to correct genotyping errors and impute most of the missing genotypes [using the fill.geno() function with method = "maxmarginal, " map.function = "kosambi, " and min.prob = 0.8]. The resulting genotypes were visualized with a custom R script (Supplementary Figure S1) and output in R/qtl csvr format to facilitate conversion to hapmap format via a custom awk command.

Genetic Mapping of Segregating GBS Tag Sequences
Custom Tassel3 (Bradbury et al., 2007) code was used to filter the TagsOnPhysicalMap (TOPM) and TagsByTaxa (TBT) data structures produced in the GBS SNP calling pipeline (Glaubitz et al., 2014) so that they contained only tags with unique alignment positions with no sequence divergence from the contig-level reference assembly. The TBT was also filtered to retain only the F 2 individuals present in the final linkage map and only tags that appeared to segregate in the F 2 [present in at least 30 and no more than 256 (80%) of the 366 individuals]. Each tag was then genetically mapped as a dominant marker against the linkage map. Because GBS was performed at low sequencing depth, the absence of a tag in an F 2 individual is not always informative; hence, only the progeny in which a given tag was observed were used to calculate the recombination rate between that tag and each SNP in the linkage map. For this subset of progeny, the recombination rate was calculated as: min[nHomPar_2/(nHomPar_1 + nHomPar_2), nHomPar_1/(nHomPar1 + nHomPar2)] where nHomPar_1 was the number of individuals with the tag that were homozygous for the parent_1 allele at the SNP in question, and nHomPar_2 was the number of individuals with the tag that were homozygous for the parent_2 allele. Heterozygous individuals at the SNP were excluded as non-informative. Tags were considered genetically mapped if the recombination rate was < 5%, and the sample size (nHomPar1 + nHomPar2) was at least 30 (two-tailed binomial P < 6e −8 ). The genetic mapping span (GeneticStart to GeneticEnd) for a tag was from the first to last SNP on the linkage group with the same minimum recombination rate, and the genetic mapping position (GeneticMean) was the mean of this span, where the SNP positions were numbered consecutively from 1 to the number of SNPs on the linkage group.

Genetic Mapping of Contig Tag Haplotypes and Assembly Into Pseudomolecules
In light of the low sequencing depth of GBS, the statistical power for mapping contigs was increased by combining all of the concordant genetically mapped tags for a contig into a single tag haplotype, with the presence of any given tag imputing the presence of the contig haplotype as a whole. For contigs with multiple genetically mapped tags, the consensus linkage group assignment was determined by a majority rule weighted by sample size. For those tags mapped to the same linkage group, the consensus genetic mapping position was determined as the sample size weighted average of the tag mapping positions (GeneticMean). Tags with genetic mapping positions on the same linkage group and within 50 SNPs of the consensus genetic mapping position were considered as agreeing with the consensus and were thus merged into a contig tag haplotype, by summing their tag counts across each taxon. The contig tag haplotypes were then genetically mapped in the same manner as the individual tags, except that a minimum recombination rate of 10% was used, and to prevent false positives from occasional sequencing errors, a contig tag haplotype was only considered present in a genotype if the count for that genotype was at least 5% of the mean of the non-zero counts across genotypes.
The genetically mapped contigs were ordered into pseudomolecules according to the following rules, using custom awk and bash commands: (1) contigs placed on the linkage map via segregating SNPs were kept in the same relative order, regardless of whether they were mapped by tag haplotype or not, in which (a) contigs with multiple mapped SNPs were quasi-oriented if the centimorgan position differed between the first and last SNP in the contig, with the first and last SNP defined by physical position in the contig, (b) a contig with multiple SNPs was not always contiguous in the map, as it was sometimes comingled with one or more additional contigs, and the order of comingled contigs was resolved by the position of their first SNP in the linkage map, and (c) contigs without genetic consensus (e.g., with an equal number of SNPs mapping to different linkage groups) were removed from the assembly; (2) contigs mapped only by tag haplotype were placed immediately after the contig containing their GeneticMean SNP, with the following sort order: GeneticStart, GeneticEnd, contig type (scaffold < contig), contig name; (3) ordered contigs on a linkage group/pseudomolecule were separated by 100 N's; and (4) the linkage groups/pseudomolecules were named LG-1, LG-2, LG-3, LG-4, LG-5, LG-6, LG-7, and LG-8, by modifying the names of the linkage groups assigned by the MSTMap software.

Multidimensional Scaling, Heterozygosity, and Inbreeding Coefficient
Ninety-five percent (120,438) of the 126,859 filtered SNPs were biallelic. Multidimensional scaling based on these genome-wide biallelic SNPs displayed the distribution of the 375 individuals of the mapping population, their parents, as well as the other three genotypes included in the study (Figure 1). In this analysis the two parents, Par_1 and Par_2, were positioned at the top right and bottom left corners of the plot, respectively, and the MDS clearly displayed the clustering of the three L. campestre genotypes (Par_1, Stu_7 and C92_2_3) and the two L. heterophyllum genotypes (Par_2 and Hast_3) at their respective corners (Figure 1). The two Swedish genotypes of L. campestre (Par_1 and Stu_7) were more closely related to each other than to L. campestre genotype originally collected from Greece (C92_2-3). The F 2 individuals spread widely across the two dimensions with the highest concentration around the center (Figure 1). The distribution of these individuals shows that they represented the whole F 2 population very well. Based on these data, it is possible to select individuals that are genetically more similar to L. campestre (the target species for domestication) for further breeding. For example, individuals such as Hy25_A24 and Hy25_A307 would be among the top candidates for further breeding or crossbreeding with L. campestre, if they have desirable traits such as perenniality.
Observed heterozygosity (Ho) and expected heterozygosity (He) as well as inbreeding coefficient (F) were calculated for the three L. campestre, two L. heterophyllum samples, as well as for of the F 2 individuals across thousands of SNP loci (2,331-100,760 loci) (Table 1 and Figure 2). These parameters were calculated for each individual after removing loci with missing data. In the case of all filtered SNPs (126,859), 50,439-10,076 SNP loci remained per individual after removing the loci with missing data. Similarly, removing the loci with missing data for mapped SNPs resulted in 2,326-2331 loci per individual. For the three L. campestre genotypes, only 5.1-5.7% of the loci were heterozygous, and the mean heterozygosity was 5.4% (P_Ho = 0.054; Table 1). Similarly, heterozygous loci accounted for a mean of 5.5% in L. heterophyllum. Inbreeding coefficient (F) was 0.64 on average for both species. In the case of F 2 population, the proportion of observed heterozygosity was 13.1 and 51.9% for all filtered and mapped SNPs, with corresponding inbreeding coefficient of 0.07 and −0.08, respectively (Table 1 and Figure 2).

Linkage Map Construction With GBS SNPs
The final linkage map consisted of eight linkage groups (LGs) containing 2,331 SNP markers derived from 1,044 contigs, and spanned 881 Kosambi cM in total (Figure 3, see also Supplementary Figure S1). The number of mapped SNPs per contig varied from 1 to 66, and 305 (29.2%) of the 1,044 contigs contained more than one mapped SNPs (Supplementary Tables S1, S2). Of the 305 contigs possessing more than one mapped SNPs, only one was mapped to more than one linkage groups (scaffold140, with four SNPs on LG-2 and one SNP on LG-5). The SNP on LG-5 was excluded, and hence, the total number of SNPs shown on the linkage map is 2,330 (Figure 4 and Supplementary Tables S1, S2). The SNPs on each contig were mapped within 1 cM of each other for 81.0% of the contigs, within 5 cM for 96.4%, and within 10 cM for 99.7% (Figure 5 and Supplementary Tables S2). These results indicate that the assembly of paired end reads into contigs was highly accurate. LG-1 to LG-8 in that order (Figure 4).

Genetic Mapping of Segregating GBS Tag Sequences and Contig Tag Haplotypes
In total, 34,342 segregating 64-base GBS tag sequences from 9,943 contigs were placed on the genetic map within a recombination fraction of 5% of one or more SNP markers (Supplementary Tables S3). Of these tags, 33,832 either agreed with the genetic consensus for their respective contig or were the sole representative tag, whereas 510 (1.5%) disagreed with the consensus and were excluded from further analysis, along with 12 contigs that did not display genetic consensus. The remaining 9,931 contigs with genetic consensus were all successfully mapped, as a segregating tag haplotype, to within a recombination fraction of 7.9% of one or more SNPs in the linkage map (Supplementary Tables S3). In total, 10,302 contigs were genetically mapped, with 371 mapped only by SNP, 673 mapped by both SNP and contig tag haplotype, and 9,258 mapped only by contig tag haplotype (Supplementary Tables S1). The sequences of the 10,302 mapped contigs have been deposited at DDBJ/ENA/GenBank as Whole Genome Shotgun project under the accession number WJSH00000000. These contigs were assembled into a pseudomolecule fasta file according to the ordering and orientation rules described in section Materials and Methods above. The pseudomolecules in the assembly covered 120,594,250 bases (∼120.6 Mbp) (
The contigs/scaffolds mapped to Lepidium LGs were blast searched against the At genome, and the position of corresponding sequences within At chromosome sequences were determined. Based on major sequence coverage and identity, the correspondence between the Lepidium LGs and At chromosomes were grouped into three groups ( Figure 6A). Group 1 shows the correspondence of LG-1 and LG-2 with Atx-1. Group 2 contains LG-3, LG-4, and LG-5 as well as Atx-2 and Atx-3.
The comparison of the LGs with At chromosomes revealed three regions within the LG sequences as shown in Figure 6A: (1) regions shown in colors other than gray and black showed high collinearity with At chromosome sequences, (2) regions shown in gray are either did not show significant collinearity or did not match sequences of At chromosomes, and (3) regions shown in black do not have mapped contigs and hence could not be compared with At chromosomes. In the highly collinear regions, collinearity is either in the same or opposite direction as compared to the corresponding At chromosome sequences, as shown in black upward and downward arrows, respectively, in Figure 6B. At least 66% of the sequences of the LGs showed high collinearity with At chromosome sequences. However, significant portions of LG-3, LG-6, and LG-8 were either gray or black, whereas LG-2 and LG-5 have significant gray regions ( Figure 6A). On the other hand, the whole regions of LG-1, LG-4, and LG-7 showed high collinearity with their corresponding At chromosome sequences.
The map positions of the homologs of 24 At genes that are known to regulate various traits in Arabidopsis have been identified on Lepidium LGs (Table 3 and Figure 6A). LG-1 carries the homologs of NAC012 and AGO1 genes on Atx-1. The homologs of TAG1 (Atx-2), ABI3 (Atx-3), and FAE1 (Atx-4) were located on LG-3. LG-4 carries the homologs of AGL6, SOC1, and ER genes on Atx-2. Similarly, the homologs of five genes on Atx-3 (FUSCA3, GTR1, FER, WRI1, and ADPG1) were located on LG-5, whereas the homologs of IND and KNAT1 (Atx-4) were located on LG-6, which also contained the homolog of ALC gene (Atx-5).
LG-7 carries the homologs of AP2 and VRN2 genes on Atx-4. The homologs of FUL, GTR2, ATG5, FLC, TFL1, and RPL genes that belong to Atx-5 were located on LG-8. None of the homologs of the 24 At genes were located on LG-2.

DISCUSSION
Advanced next generation sequencing technologies allow identification of thousands of polymorphic markers that have various applications including the determination of genetic diversity and development of high-density genetic linkage map in plant species. The present study revealed an average observed heterozygosity (Ho) of <6% in both L. campestre and L. heterophyllum signifying that both species are predominantly inbreeders. In the F 2 mapping population, only 13.1% of the filtered SNPs were heterozygous, on average, which is significantly lower than the 50% heterozygosity expected for the Frontiers in Plant Science | www.frontiersin.org whole F 2 population derived from the crosses of the two parental plants. Although random F 2 seeds were planted, the seedlings of some of them were extremely weak or unhealthy at very young age, and consequently, leaf tissue was not sampled from such plants for DNA extraction. Given the obtained result of low proportion of heterozygous SNPs in the mapping population, it is likely that most of the unfit seedlings had higher proportion of heterozygosity across their genome. This finding may suggest that plants with higher proportion of heterozygosity across their genome perform poorly, the case that can be generally regarded as heterozygote disadvantage. This is in line with theoretical prediction that homozygosity is fixed easily in strongly selfing plant species if rearrangements reduce fitness (Charlesworth, 1992). The 51.9% heterozygosity for the 2,331 mapped SNPs is in line with the expected 50%, and this has been obtained through discarding SNPs deviating from 1:2:1 segregation with P < 0.01 to make it suitable for the linkage mapping.
Construction of linkage map is an important step in the identification of genes and molecular markers for its application in plant breeding. In this study, we used the GBS method (Baird et al., 2008;Elshire et al., 2011) to simultaneously discover new SNP markers and genotype individual samples from two Lepidium species (L. campestre and L. heterophyllum) as well as an F 2 mapping population of interspecific hybrid between genotypes of the two species. SNP markers discovered through GBS has previously been successfully used for the construction of high-density genetic linkage mapping in various plant species including barley and wheat (Poland et al., 2012), Aethionema arabicum (Nguyen et al., 2019), Avena species (Latta et al., 2019), and grapevine (Tello et al., 2019). The eight linkage groups constructed in this study correspond to the eight haploid chromosome number previously reported for both L. campestre and L. heterophyllum (Rice et al., 2015).
The Brassicaceae family has been described as having four major evolutionary lineages (Bailey et al., 2006;Couvreur et al., 2010). The ancestral karyotype of lineages I and II is composed of eight chromosomes, which later evolved into the ancestral Camelineae karyotype of eight chromosomes and the Calepineae karyotype of seven chromosomes (Murat et al., 2015). Lepidium and Arabidopsis belong to lineage 1 and evolved from ancestral Camelineae karyotype of eight chromosomes, which makes it suitable to transfer genomic information from Arabidopsis (a model species) to Lepidium (species of agronomic interest). Hence, the sequences of A. thaliana chromosomes were used as a tool for comparative analyses of Lepidium and Arabidopsis genomes in this study. The LGs of L. campestre are named from LG-1 to LG-8 in a way that they match previously reported chromosome nomenclature  of Brassicaeae species with a haploid chromosome number of eight (n = 8), such as Arabidposis lyrata (Boivin et al., 2004;Kuittinen et al., 2004;Yogeeswaran et al., 2005;Hu et al., 2011;Murat et al., 2015).
LG-1 to LG-8 match chromosomes 1-8 of A. lyrata in that order. Studies on different Brassicaceae species have revealed chromosomal events through which a karyotype of n = 8 in an ancestral species shared by Lepidium and Arabidopsis have evolved to a karyotype of n = 5 in A. thaliana (Boivin et al., 2004;Kuittinen et al., 2004;Yogeeswaran et al., 2005;Koch and Kiefer, 2005;Lysak et al., 2006;Hu et al., 2011;Murat et al., 2015). The comparative genomic analysis between A. lyrata (n = 8) and A. thaliana (n = 5) revealed that more than 50% the A. lyrata genome is absent in A. thaliana, whereas ∼25% the A. thaliana genome is missing in A. lyrata (Hu et al., 2011), which is the result of accumulated chromosomal and point mutations since the two species separated roughly 5-6 million years ago (mya) (Koch and Kiefer, 2005). However, the overall sequence identity between their homologous sequences is >80% (Hu et al., 2011), which is comparable with the overall sequence identity of 82.9% between homologous sequences of L. campestre and A. thaliana. The degree to which genes and genomic regions are maintained on corresponding chromosomes (remain syntenic) and in corresponding orders (remain collinear) over a period of time varies among eukaryotic genomes (Coghlan et al., 2005). Correlating arrangements of genomic regions of a model species with a related species allows inference of shared ancestry of genes as well as utilization of known genetic information of the model species to study less-well-understood systems (Tang et al., 2008). About 10 major rearrangements have been reported between A. thaliana and A. lyrata, including two reciprocal translocations and three chromosomal fusions (Kuittinen et al., 2004;Yogeeswaran et al., 2005;Lysak et al., 2006) that resulted in the formation of a karyotype of five chromosomes in A. thaliana from the ancestral state of eight chromosomes that still exist in other Brassicaceae species, such as A. lyrata. The fusions and translocations reported based on the karyotypes of the two Arabidopsis species are also evident in this study through comparison of the L. campestre and A. thaliana genomes. As a result of the fusion of ancestral chromosomes, LG-1 and LG-2 matched Atx-1, LG-3, and LG-4 matched Atx-2, and LG-8 and LG-6 matched Atx-5. In all three pairs, the first LGs matched the upper part of their corresponding At chromosomes.
LG-5 corresponds to Atx-3, whereas LG-7 corresponds to Atx-4. The inversion within the lower arm of Atx-1 that corresponds to ancestral chromosome-2 (Lysak et al., 2006) was also evident in this study (inversion of corresponding region of LG-2). Following the fusion of ancestral chromosomes 3 and 4, unequal reciprocal translocation occurred between the fused chromosome (at region corresponding to chromosome 3) and the upper part of ancestral chromosome 5 (Lysak et al., 2006;Hu et al., 2011), which is in line with the present study (Figures 6A,B). However, unlike the case between A. lyrata and A. thaliana, the comparison of Lepidium and Arabidopsis genomes revealed inversion of the translocated block to the fused chromosome, suggesting the occurrence of further major chromosomal rearrangements after the Arabidopsis and Lepidium lineages were separated. Following the fusion of ancestral chromosomes 6 and 8, unequal reciprocal translocation occurred between the fused chromosome (at region corresponding to chromosome 6) and the upper part of ancestral chromosome 7 (Lysak et al., 2006;Hu et al., 2011), which is in agreement with the present study (Figures 6A,B). Among the two inversions previously reported (Lysak et al., 2006;Hu et al., 2011), the inversion within Atx-5 was evident but not the one within Atx-4. Parkin et al. (2005) identified 21 shared syntenic blocks between A. thaliana and Brassica napus genomes representing collinear regions maintained since the divergence of their lineages approximately 20 mya. The genomes of A. thaliana and A. lyrata are ∼90% syntenic that predominately showed highly conserved collinear arrangements, although multiple inversions also exist between the genomes (Hu et al., 2011). Similarly, the present study showed an average of 87.6% synteny between the sequences of Lepidium LGs and A. thaliana chromosomes, and inversion of small segments are common throughout the Lepidium LGs. Except in LG-1, LG-4, and LG-7, the other LGs have significant regions that are unalignable with A. thaliana chromosomes similar to the case between A. thaliana and A. lyrata that have unalignable regions throughout the genome. The latter case is mainly due to deletions throughout A. thaliana genome, suggesting that deletions are favored over insertions and hence smaller genome (Hu et al., 2011).
Through comparative analysis of genomic sequences, the linkage map positions of the homologs of 24 genes that are known to regulate various traits in A. thaliana have been located on Lepidium LGs (Table 3 and Figure 6A). These genes regulate traits that are targeted for improvement within the domestication project of L. campestre. The homologs of the NAC DOMAIN CONTAINING PROTEIN 12 (NAC012) and ARGONAUTE 1(AGO1) were mapped to LG-1. NAC012 contributes to the regulation of pod shattering in A. thaliana through controlling the development of secondary walls in siliques (Rajani and Sundaresan, 2001;Liljegren et al., 2004). The sequence identity between the NAC012 partial coding sequences of these two species was 92% (Gustafsson et al., 2018). AGO1 is involved in the determination of inflorescence architecture in Arabidopsis through suppressing the TERMINAL FLOWER 1 (TFL1) (Ferrándiz-Nohales et al., 2014).
The homologs of TRIACYLGLYCEROL BIOSYNTHESIS DEFECT 1 (TAG1), ABA-insensitive 3 (ABI3), and FATTY ACID ELONGATION 1 (FAE1) were mapped to LG-3. TAG1 is one of the genes involved in the biosynthesis of fatty acids, and it regulates oil production (Routaboul et al., 1999;Jako et al., 2001); FAE1 controls seed oil composition (James et al., 1995), and ABI3 regulates seed dormancy (Ooms et al., 1993) in Arabidopsis. The sequence identity of TAG1 and FAE1 between the partial coding sequences of these two species was 93 and 88%, respectively (Gustafsson et al., 2018). The homologs of A. thaliana AGAMOUS LIKE 6 (AGL6), SUPRESSOR OF OVEREXPRESSION OF CO1 (SOC1), and ERECTA (ER) were mapped to LG-4. AGL6 and SOC1 are flowering time genes that regulate flowering in Arabidopsis. The sequence identity of AGL6 and SOC1 between the partial coding sequences of these two species was 98 and 94%, respectively (Gustafsson et al., 2018). ER regulates traits such as internode length and angles of pods (Douglas et al., 2002;Venglat et al., 2002), which have direct effect on seed yield through the determination of the number of pods on each inflorescence.
The homologs of APETALA2 (AP2) and VERNALIZATION 2 (VRN2) were mapped to LG-7. AP2 is a transcription factor gene involved in the regulation of flowering and seed development (Okamuro et al., 1997) as well as controlling seed yield (Jofuku et al., 2005), whereas VRN2 regulates vernalization responses (Gendall et al., 2001) in Arabidopsis. AP2 and VRN2 showed 89 and 90% sequence identity, respectively, between A. thaliana and L. campestre in their partial coding sequences (Gustafsson et al., 2018). The homologs of FRUITFULL (FUL), GLUCOSINOLATE TRANSPORTER-2 (GTR2), AUTOPHAGY RELATED 5 (ATG5), FLOWERING LOCUS C (FLC), TERMINAL FLOWER 1 (TFL1), and REPLUMLESS (RPL) were mapped to LG-8. FUL controls the development of the wall of seedpods (valve) (Gu et al., 1998;Ferrándiz et al., 2000) and thereby regulates pod shattering. GTR2 regulates glucosinolates transport (Andersen and Halkier, 2014), whereas ATG5 is a gene involved in plant defense (Yoshimoto et al., 2009). FLC is a MADS-box gene that plays a key role in regulating plant developmental responses to temperature as well as flowering (Michaels and Amasino, 1999;Sheldon et al., 2000). TFL1 is a gene involved in the determination of inflorescence architecture in Arabidopsis (Ferrándiz-Nohales et al., 2014), whereas RPL is one of several genes responsible for the establishment of the valve margin in the seed-containing pod and thereby involved in the regulation of pod shattering (Roeder et al., 2003). According to Gustafsson et al. (2018), the partial coding sequences of FUL, GTR2, ATG5, FLC, and RPL showed 92, 88, 81, 90, and 89% sequence identity, in that order, between A. thaliana and L. campestre.
In summary, the eight LGs revealed in this study confirm the previously reported (1) haploid chromosome number of eight in L. campestre and L. heterophyllum; (2) chromosomal fusion, translocation, and inversion events during the evolution of n = 8 karyotype in ancestral species shared by Lepidium and Arabidopsis to n = 5 karyotype in A. thaliana. The construction of high-density genetic linkage map bearing thousands of polymorphic markers and the identification of homologs of various desirable genes in the present study are significant steps toward the application of marker-aided and genomic selection in L. campestre for its accelerated domestication.

DATA AVAILABILITY STATEMENT
The datasets generated for this study can be found in the DDBJ/ENA/GenBank under the accession WJSH00000000. The version described in this paper is version WJSH01000000.

AUTHOR CONTRIBUTIONS
RO and MG secured the funding. CG and MG developed the mapping population and extracted DNA and conducted comparative genomics analysis. JG generated the GBS data and constructed genetic linkage map. MG and JG wrote the manuscript. All authors conceived and designed the study, contributed to data analysis, revised the manuscript, and read and approved the final version of the manuscript for publication.

ACKNOWLEDGMENTS
We would like to thank PlantLink for bioinformatics support. We would also like to acknowledge Professor Sten Stymne for his major role in the development of the SSF project.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2020.00448/ full#supplementary-material FIGURE S1 | A framework genetic map comprising 2331 SNP markers on eight linkage groups: The rows are F 2 individuals and the columns (each only a few pixels wide) are markers; the vertical black lines separate the linkage groups; blue is heterozygous, green is homozygous for parent-1, yellow is homozygous for parent-2, and pink is missing data.