ORIGINAL RESEARCH article
The Impact of Genotyping-by-Sequencing Pipelines on SNP Discovery and Identification of Markers Associated with Verticillium Wilt Resistance in Autotetraploid Alfalfa (Medicago sativa L.)
- 1Plant Germplasm Introduction and Testing Research, United States Department of Agriculture-Agricultural Research Service, Prosser, WA, USA
- 2Department of Horticulture, Washington State University, Pullman, WA, USA
- 3S&W Seed Company, Arlington, WI, USA
Verticillium wilt (VW) of alfalfa is a soilborne disease causing severe yield loss in alfalfa. To identify molecular markers associated with VW resistance, we used an integrated framework of genome-wide association study (GWAS) with high-throughput genotyping by sequencing (GBS) to identify loci associated with VW resistance in an F1 full-sib alfalfa population. Phenotyping was performed using manual inoculation of the pathogen to cloned plants of each individual and disease severity was scored using a standard scale. Genotyping was done by GBS, followed by genotype calling using three bioinformatics pipelines including the TASSEL-GBS pipeline (TASSEL), the Universal Network Enabled Analysis Kit (UNEAK), and the haplotype-based FreeBayes pipeline (FreeBayes). The resulting numbers of SNPs, marker density, minor allele frequency (MAF) and heterozygosity were compared among the pipelines. The TASSEL pipeline generated more markers with the highest density and MAF, whereas the highest heterozygosity was obtained by the UNEAK pipeline. The FreeBayes pipeline generated tetraploid genotypes, with the least number of markers. SNP markers generated from each pipeline were used independently for marker-trait association. Markers significantly associated with VW resistance identified by each pipeline were compared. Similar marker loci were found on chromosomes 5, 6, and 7, whereas different loci on chromosome 1, 2, 3, and 4 were identified by different pipelines. Most significant markers were located on chromosome 6 and they were identified by all three pipelines. Of those identified, several loci were linked to known genes whose functions are involved in the plants’ resistance to pathogens. Further investigation on these loci and their linked genes would provide insight into understanding molecular mechanisms of VW resistance in alfalfa. Functional markers closely linked to the resistance loci would be useful for MAS to improve alfalfa cultivars with enhanced resistance to the disease.
Verticillium wilt (VW) is a soil-borne disease caused by the fungal pathogen Verticillium alfalfae (Renormalized by Inderbitzin et al., 2011) in alfalfa (Medicago sativa L.). It causes severe forage yield loss in US and Canada (Graham et al., 1977; Atkinson, 1981; Grau et al., 1981; Gray and Roth, 1982; Gordon et al., 1989). The disease symptoms start with leaf tip chlorosis, leaf desiccation and abscission, and then, as disease progresses, infected plants eventually wilt and die (Leath and Pennypacker, 1990). Methods such as eradication of broadleaf weed hosts and crop rotation may be used for controlling the disease (Leath and Pennypacker, 1990). However, due to other dispersal factors including insects, manure, wind, and water, these control strategies are often ineffective. For this reason, the preferred method of control of the disease is the use of resistant varieties (Peaden et al., 1985; Leath and Pennypacker, 1990). It has been suggested that cultivars require at least 60% resistant plants for reasonable protection against the disease (Grau, 1991).
Efforts toward the goal of developing alfalfa cultivars resistant to VW have been made using a traditional breeding strategy (Delwiche, 1991). However, the traditional strategy for selecting resistant cultivars is based on plant’s reaction to the disease (Delwiche, 1991), and the effect of environmental factors on disease expression leads to difficulty in accurately screening alfalfa for resistance to VW. Development of high-throughput diagnostic markers linked to the plant resistance gene(s) would enable more robust breeding strategies, as they can be used in marker-assisted selection of resistant varieties in a timely manner. Single-nucleotide polymorphism (SNP) markers have become the technology of choice for most organisms because of their high frequency, wide distribution in genomes, and ready adaptation to highly multiplex detection systems.
Quantitative traits such as biotic and abiotic stress resistance are most likely under the control of multiple genes and interact with environmental factors. Identification of resistance loci that contribute to variation in such complex traits is a primary challenge in plant breeding and population genetics. An integrated framework that merges a QTL mapping approach called a “genome-wide association study (GWAS)” with high-throughput genome sequencing methodologies called “genotyping by sequencing (GBS)” provide a statistical basis for analyzing marker-trait association using linkage disequilibrium, and help to map traits quickly, efficiently, and in a relatively inexpensive manner. GBS generates large raw datasets of sequence reads and requires bioinformatics pipelines to analyze and interpret the GBS datasets. Current GBS pipelines such as TASSEL-GBS (TASSEL) (Glaubitz et al., 2014) and the Universal Network Enabled Analysis Kit (UNEAK) (Lu et al., 2013) have been applied for SNP discovery. The TASSEL pipeline was initially developed for diploid species but has also been used successfully in polypoid species such as wheat and barley (Poland et al., 2012). UNEAK is a non-reference, network-based pipeline. Although it has been used for genotype calling in polypoid species such as switch grass (Lu et al., 2013) and alfalfa (Li et al., 2014; Annicchiarico et al., 2015, Annicchiarico et al., 2016), it has been a challenge for GBS genotype calling in autotetraploid species such as alfalfa and potato due to their outcrossing and high heterozygosity. Recently, the software FreeBayes has been developed (Garrison and Marth, 2012) and used for GBS genotype calling in autotetraploid potato (Uitdewilligen et al., 2013) and alfalfa (Zhang et al., 2015). As FreeBayes is a haplotype-based pipeline, variants including single and multiple nucleotide polymorphisms (SNPs and MNPs, respectively), allelic series of tri-SNPs and tetra-SNPs, MNPs, and indels with a variable number of nucleotides, and nulliplex (aaaa), simplex (aaab), duplex (aabb), triplex (abbb) and quadruplex (bbbb) genotypes can be identified for the tetraploid samples. In a previous report, we used FreeBayes in combination with other software such as Stacks (Catchen et al., 2011) and GATK (McKenna et al., 2010) for GBS SNP discovery and identified loci associated with VW resistance in a breeding population of alfalfa developed by Forage Genetics International (Yu et al., 2016).
Cultivated alfalfa is an allogamous autotetraploid (2n = 4x = 32) with a basic chromosome number of eight and a genome size of 800–1000 Mbp (Blondon et al., 1994). Alfalfa plants are highly heterozygous and exhibit strong inbreeding depression. Selfing can cause either self-sterility or lethal allelic combinations. F2 populations may suffer from a genetic bias induced by the death of some genotypes (Julier et al., 2003). It is a considerable challenge to genotype individuals with such a complicated genome. To address whether different GBS pipelines affect SNP discovery in alfalfa, in the present study, we carried out GBS in an F1 alfalfa population developed by S & W Seeds for selecting for VW resistance. We compared three GBS pipelines, including the TASSEL, UNEAK and FreeBayes pipelines for genotype calling using the same GBS data set. The resulting marker data set generated by each pipeline was used for marker-trait association using linkage disequilibrium. Markers associated with VW were identified and compared among different pipelines for the effect of pipelines on SNP discovery and marker identification.
Materials and Methods
Plant Materials and Phenotyping
An alfalfa population containing 188 F1 progeny was developed from a cross between the parental plants of 55V50-58 (resistant) × 55V50-118 (susceptible) at S & W Seeds. The two parents and the F1 progeny were propagated by stem cuttings. Three clones from each individual were evaluated for resistance to VW by the standard assay of the North American Alfalfa Improvement Conference1 as described previously by Zhang et al. (2014).
The disease severity scores of three replicated clones were obtained for each individual and analyzed using Levene’s and t-tests to evaluate the equality of variance and means (Table 1). Based on the assumption of equal variance, a test for normality was performed on the obtained disease severity scores of the population with the Kolmogorov–Smirnov tests (Table 1) using the SPSS software2. Least square means were estimated from 3 replicated disease severity scores using the SAS PROC mixed (SAS Institute Inc. 2011, SAS Online Doc 9.3, Cary, NC, USA) and used for association mapping.
TABLE 1. The Levene’s and t-tests, and the normal distribution of the Shapiro–Wilk and Kolmogorov–Smirnov tests.
Genotyping by Sequencing
High molecular weight DNA was extracted from leaves of the original plants used to make clones using the Qiagen DNeasy 96 Plant kit, following the manufacturer’s protocol (Qiagen, CA). Genotyping-by-sequencing was carried out as described by Elshire et al. (2011). Briefly, DNA samples from individuals of the association panel were digested by ApekI restriction enzyme. GBS libraries were prepared by ligating the digested DNA to unique nucleotide adapters (barcodes) in 96-plex, followed by PCR amplification. Sequencing was performed on the Illumina HiSeq2000 instrument using two lanes.
Three GBS pipelines, the FreeBayes pipeline, the TASSEL-GBS (TASSEL) pipeline, and the Universal Network-Enabled Analysis Kit (UNEAK) pipeline, were used for SNP/variant discovery. The FreeBayes and TASSEL pipelines used the reference genomic sequences of M. truncatula (Mt4.0, v1), while no reference sequence was used in the UNEAK pipeline. The working flows of the pipelines are outlined in Figure 1. For the FreeBayes pipeline (Figure 1, left panel), GBS raw sequence reads were quality checked using FastQC (v0.11.2), followed by demultiplexing with STACKS (v 1.23) (Catchen et al., 2011). Clean reads were then aligned to the reference genome sequence of M. truncatula using the Burrows–Wheeler Aligner (BWA) (Li and Durbin, 2009). SAMtools (version 0.1.19) (Durbin et al., 2009) and Picard (version 1.943) were then used to mark duplicate reads and estimate the average insert size of the single-end reads. The read-depth and coverage data was processed with the Genome Analysis Toolkit (GATK) (McKenna et al., 2010), and in-house Perl scripts and BEDTools (Quinlan and Hall, 2010). Sequence variants were called using FreeBayes (Version 0.9.15) (Garrison and Marth, 2012) after removing duplicate reads. Consequently, haplotype-based variants including SNPs and MNPS (single and multiple nucleotide polymorphisms, respectively), allelic series of tri-SNPs and tetra-SNPs, MNPs, and indels with a variable number of nucleotides were identified. Default parameters were used for estimating pairwise diversity. A minimal read depth of 15X at a specific variant position was used for identifying an alternative allele as a variant. The QUAL was used for estimating the phred-scaled probability and sites with a QUAL value less than 20 were removed. Using these criteria, an estimated error rate of less than 0.01 was obtained. The R-package “pegas”4 and custom Perl scripts were used for calculating the heterozygosity of variants.
FIGURE 1. Three GBS pipelines including FreeBayes (left), TASSEL-GBS (medium) and TASSEL-UNEAK (medium and right) used for genotype calling. Bioinformatics steps are presented in each pipeline. Note that the TASSEL-GBS and UNEANK share most of steps in their pipelines except additional steps are used in the UNEAK pipeline (right).
For the TASSEL-GBS pipeline, a TASSEL_Plugin interface with default parameters5 was used to call SNPs (Figure 1, intermediate). Tag counts were generated from FastQ files with the FastqToTagCountPlugin. The tag counts were merged with the MergeMultipleTagCountPlugin, the minimum number of five times presence for a tag was required for output. Tags were then aligned to the reference genome of M. truncatula using BWA. Tags were converted to FastQ with the TagCountToFastqPlugin. Counts of tags per individual (taxa) were generated with the FastqToTBTPlugin. Counts of tags per individual were merged with the MergeTagsByTaxaFilesPlugin. SNPs were called using the TagsToSNPByAlignmentPlugin. Duplicate sites were merged with the MergeDuplicateSNPsPlugin and then with the MergeDuplicateSNP_vcf_Plugin followed by the tbt2vcfPlugin. Finally, a filtered HapMap containing GBS SNPs was formatted with the GBSHapMapFiltersPlugin (Figure 1, medium panel). The minimums of site coverage of 0.8, taxa coverage of 0.1 and MAF of 0.01 were applied in the filter.
For the UNEAK pipeline, a network approach was used for discovering SNPs without a reference genome as described by Lu et al. (2013). It was implemented in the TASSEL-GBS software program with two additional steps, UTagCountToTagPairPlugin and then UExportTagPairPlugin. They were added to the TASSEL pipeline between the TagcountToFastqPlugin and the FastqToTBTPlugin steps (Figure 1, right panel). The same parameters used in TASSEL-GBS were used for UNEAK except that an error tolerance rate of 0.03 was used in the network filter and a distance of 1,000 was used for padding tag pairs.
The GBS data have been submitted to the NCBI Sequence Read Archive with the BioProject ID: PRJNA343543.
To estimate effects of genotype and its interaction with environment, we used least square means of replicated disease scores to evaluate associations. A mixed linear model (MLM) was used for analyzing marker-trait association using TASSEL (Bradbury et al., 2007). Each set of marker data generated by the TASSEL, UNEAK and FreeBayes pipelines were further filtered using 5% MAF and 25% missing values before association mapping. Each filtered marker data and the same set of phenotypic data were used for association mapping. To correct for population structure, Kinship and principal component analysis were carried out for generating K and Q matrices, respectively. The K and Q matrices were used as covariance and integrated into the MLM for controlling for population structure. A false discovery rate (FDR) of 0.05 was used as multiple-test correction for significantly associated markers (Benjamini and Hochberg, 1995).
BLAST Search for Putative Candidate Genes
Flanking sequences of significant markers (Supplementary Table S4) were used as queries for BLAST search in the DNA database of the National Centre for Biotechnology Information (NCBI6) and Phytozome against the Medicago truncatula genome sequence, Mt4.0 v17. Known genes linked to the significant loci were assigned as putative candidates based on the annotation of gene functions.
Analysis of Phenotypic Data
For phenotyping VW resistance, the foliar symptoms of individual plants were evaluated after pathogen inoculation in the greenhouse and scored for VW resistance. Resistance scores of 3 replications were analyzed for the 190 individuals (570 plants total). The P-value for equality of variances (Levene’s Test) was less than 0.001 (Table 1). The t-test for equality was significant (p < 0.05). A normal distribution with a mean of 2.75 and a standard error of 0.08 was observed (Shapiro–Wilk test, Table 1).
Genotype Calling by Different Pipelines
The FreeBayes pipeline generated 10,403 valid variants after filtering (Figure 2, Supplementary Table S1). The site depth was 1.3 with 0.28 of missing value (Table 2). The average minor allele frequency (MAF) was 0.11 (Figure 3A). The average heterozygosity was 0.19 (Figure 4A) with 0.5% SNPs having heterozygosity ≥ 0.5 (Table 2).
FIGURE 2. Circle plot of chromosome location, marker density, minor allele frequency and heterozygosity of valid SNPs by three GBS pipelines. From outside to inside: circle 1 = chromosome location, circles 2, 3, and 4 = marker density, circle 5, 6, and 7 = minor allele frequency, circle 8, 9, and 10 = heterozygosity by FreeBayes, TASSEL and UNEAK, respectively. The colors for marker density were Red < brown < yellow < green < gray < blue.
TABLE 2. Total number, density, minor allele frequency and heterozygosity of the filtered variants generated by different GBS pipelines.
FIGURE 3. The distribution of minor allele frequency of valid variants calls by different GBS pipelines: FreeBayes (A), TASSEL (B) and UNEAK (C).
FIGURE 4. The distribution of heterozygosity of valid variants after genotyping by different GBS pipelines: FreeBayes (A), TASSEL (B), and UNEAK (C).
The TASSEL pipeline generated 24,176 valid variants (Figure 2, Supplementary Table S2). The site depth was 8.8 with 0.31 missing value (Table 2). The averages of MAF and heterozygosity were 0.32 (Figure 3B) and 0.43 (Figure 4B), respectively, and 35% SNPs with heterozygosity ≥ 0.5 (Table 2).
The UNEAK pipeline generated 14,415 valid variants with site depth of 3.5 and 0.5 missing value (Supplementary Table S3, Figure 2). The average of MAF was 0.22 (Figure 3C). The average heterozygosity was 0.39 (Figure 4C), and 28% SNPs with heterozygosity ≥ 0.5 (Table 2).
Using the criteria mentioned in Section “Materials and Methods,” we have identified 13 GBS markers from the FreeBayes pipeline significantly associated with VW resistance (Table 3, Figure 5A). BLAST search of the flanking sequences of the markers against M. truncatula pseudomolecule Mt4.0 v18 revealed that they were located on three chromosomes (Table 3). Among them, eight markers were located on chromosome 6, four on chromosome 1 and one on chromosome 7. The p-values ranged from 7.0E-6 to 8.0E-10 with R2 values of 0.12 – 0.20. The most significant marker was S7_17986403 with p = 8.0E-10 and R2 of 0.20 (Table 3).
FIGURE 5. Manhattan and quantile–quantile plots resulting from GWAS using the FreeBayes (A), TASSEL-GBS (B) and UNEAK (C) pipelines in the S & W Seeds population. A false discovery rate of 0.05 was used for significant cutoff (dot lines). Significant markers (above the cutoff lines) associated with VW resistance are listed in Tables 3–5, respectively. The chromosome positions were physical positions (not genetic positions) based on the alignment of alfalfa sequence tags to the reference genome of M. truncatula sequence (Version 3.5). “U” represents unknown chromosome. The abbreviations are putative candidate genes linked to the respective significant marker loci and the arrows indicate their genetic positions in the reference genome. ABC-TP, ABC transporter; ARRF, ankyrin repeat RF-like protein; CAP160, CAP160 repeat protein; DNA_P, DNA primase; E3_UbL, E3 ubiquitin protein ligase XBOS34; LRR, LRR receptor-like kinase family protein; NAATP, neutral amino acid transporter; PRBRP, pumilio-family RNA-binding protein; TFIIF, transcription initiation factor IIF beta subunit; TMP, transmembrane protein; TMTP, transmembrane amino acid transporter family protein; Tic22, Tic22 family protein; VITP, vacuolar iron transporter-like protein.
Twenty-two markers were identified by the TASSEL pipeline (Table 4, Figure 5B). Among them, 19 were located on 4 chromosomes (2, 4, 5, and 6) and the rest were of unknown location (“U” in Table 4). Of those with a chromosomal location, 13 were on chromosome 6, 3 on chromosome 5, 2 on chromosome 4 and one on chromosome 2. The most significant markers were S6_6801965 and S189473_1178 with p-values of 2.9E-9 and 4.6E-9, respectively (Table 4) and both markers were located on chromosome 6.
Of those with known location, 10 markers were on chromosome 6, 3 on chromosome 5, 2 on chromosome 7 and one on chromosome 3 (Table 5). The most significant markers were S5_59951052 (p = 1.77E-9), S6_162766039 (p = 2.12E-8), S6_31221028 (p = 5.01E-8) and S6_96132030 (p = 9.50E-8). The former was located on chromosome 5 and the later three were on chromosome 6 (Table 5).
Assigning the Loci Associated With VW Resistance to Known Genes
To identify potential candidate genes linked to marker loci associated with VW resistance, a BLAST search was performed as described in Section “Materials and Methods.” Of significant markers identified by the FreeBayes pipeline, 10 linked to 5 known genes in the M. truncatula genome (Table 3). Among them, three markers (AC146807-58-93588, -93595 and -93604) located at the same locus on chromosome 1 linked to the same gene, E3 ubiquitin-protein ligase (E3_Ubl). Two markers (S6_10615509 and S6_10615526) at the same locus on chromosome 6 linked to the DNA primase (DNA_P), an enzyme involved in the replication of DNA. On the same chromosome, markers contig_100652_1157 and contig_59661_3280 linked to the ankyrin repeat RF-like protein (ARRF) and Tic22 family protein (Tic22), respectively. Three additional markers (Contig_73909_2401, _2417 and _2437) at the same locus linked to the CAP160 repeat protein (CAP160).
Of significant markers identified by the TASSEL pipeline (Table 4), two markers (S5_10492250 and S5_10492298) at the same locus on chromosome 5 linked to a leucine-rich repeat receptor kinase (LRR). On the same chromosome, marker S5_25672447 linked to a pumilio-family RNA-binding repeat protein (PRBRP). A number of known genes were linked to significant markers on chromosome 6. A transmembrane amino acid transporter protein (TMTP) linked to marker S6_6801965. A MAP kinase (MAP) was linked to two markers (S6_30922451 and S6_39022471) located at the same locus. Additional markers (Table 4, “6∗”) were initially with unknown chromosome position but reassigned to chromosome 6 by BLAST search using their flanking sequences against the updated version of M. truncatula genome (Mt4.0 v1), including Contig_17722_2270 linked to the transcription factor IIF beta subunit (TFIIF), Contig_117822_726 linked to AP2 transcription factor (AP2), Contig_189473_1178 linked to a transmembrane protein (TMP), Contig_158631_4246 linked to a RNA-binding protein (RNABP), Contig_110756_995 linked to the vacuolar iron transporter-like protein (VITP), and Contig_2313381_25389 linked to the chromosome condensation regulator RCC1 repeat protein (RCC1). Interestingly, two genes, ARRF and CAP160 linked to Contig_1100652_1157 and Contig_173909_2230 respectively, were also identified by the FreeByes pipeline.
Among significant markers identified by the UNEAK pipeline (Table 5), five were linked to the same genes (TMP, PRBRP, TMTP, VITP and ARRF) identified by the TASSEL, and two (Tic22 and ARRF) identified by the FreeByes pipelines. Additional markers were only identified by the UNEAK pipeline, including S5_114680024 and S5_59951052 on chromosome 5 linked to pentatricopeptide repeat (PPR) and neutral amino acid transporter proteins (NAATP), respectively, S6_19276039 on chromosome 6 linked to transmembrane amino acid transporter protein (TMAATP), and S6_96132030 and S6_39177057 linked to unusual kinase (UK) and DUF1442 family protein (DUF1442), respectively. Interestingly, a disease resistance gene, drug resistance transporter-like ABC domain protein (ABC_TP) was linked to S7_131660043 on chromosome 7. Among markers with unknown chromosomal position, two markers, S0_126618054 and S0_32496031 were linked to 45S ribosomal RNA intergenic spacer (45S) and trehalose-6-phosphate synthase (TPS), respectively.
Comparison of GBS Pipelines Used for Genotype Calling
Among the three pipelines used for genotype calling in the present study, TASSEL and UNEAK were based on single SNPs while the FreeBayes was a haplotype-based pipeline. Both TASSEL and FreeBayes used M. truncatula as a reference genome for sequence alignment, whereas the UNEAK used a network approach without a reference genome for SNP discovery. Our results are consistent with other reports showing that different SNP callers identify different SNPs from the same input files using the same parameters (O’Rawe et al., 2013; Yu and Sun, 2013; Li, 2014; Clevenger et al., 2015). Therefore, the choice of SNP calling software is crucial when developing a SNP calling pipeline. For instance, in the present study, the FreeBayes pipeline may be more appropriate for genotype calling in autotetraploid alfalfa as it can handle haplotype-based variants including single and multiple nucleotide polymorphisms, allelic series of tri-SNPs and tetra-SNPs, MNPs, and indels with a variable number of nucleotides. We used the FreeBayes pipeline for genotyping in a diverse panel of accessions of alfalfa and identified a number of haplotype loci for drought resistance traits (Zhang et al., 2015) and in a breeding population for VW resistance (Yu et al., 2016). In the present study, compared to the TASSEL and UNEAK pipelines, although a lower missing value was found in FreeBayes, fewer variants were obtained (Table 2) and only13 markers significantly associated with VW resistance were identified by the FreeBayes pipeline (Table 3). The TASSEL pipeline generated the most variants and identified 22 significant SNP markers associated with VW resistance (Table 4). The number of significant markers (21 SNPs) identified by the UNEAK was comparable (Table 5) to that of the TASSEL pipeline. This may due to the relatively higher coverages were obtained by TASSEL (11.7X) and UNEAK (5.2X) than that by the FreeBayes pipeline (1.4X) (Table 3). Nevertheless, each pipeline identified major resistance loci to VW on chromosome 6, suggesting a consistency of effectiveness among the three pipelines used in the present study in the identification of VW resistance.
Putative Candidate Genes Linked to Marker Loci for VW Resistance
The whole genome sequence of M. truncatula provides a useful database for searching for candidate genes underlying the marker loci associated with VW resistance in alfalfa as they are close relatives. Our BLAST search results showed that 29 functional genes are linked to 36 markers identified by the three pipelines in the present study. Most of significant markers were located on chromosome 6 where shared linked genes were identified by at least two pipelines used in the present study. One of them was the ankyrin repeat protein, ARRF that was identified by all pipelines (Supplementary Table S4). The ankyrin repeat protein is a specific protein associated with the plasma membrane and provides for the interaction of the cytoskeleton with integral membrane proteins. It has been reported that the Arabidopsis ankyrin repeat protein, NPR1, provides for the onset of systemic acquired resistance (SAR) to a broad spectrum of pathogens. Defective mutants fail to respond to SAR-inducing treatments, inhibiting the expression of pathogenesis-related (PR) genes and exhibiting increased susceptibility to pathogen infections (Cao et al., 1997).
Among the resistance loci identified in the present study, two markers (S5_10492250 and S5_10492298) with the same location (48 bases away) on chromosome 5 were linked to a member of the TIR-NBS-LRR gene family. The NBS-LRR gene family has been well-documented and can be divided into two subfamilies, TIR-NBS-LRR and non-TIR-NBS-LRR (DeYoung and Innes, 2006; Grau et al., 2006; McHale et al., 2006). The plant NBS-LRR gene family contains a large class of disease resistance genes, known as R genes. The TIR-NBS-LRR genes play roles in disease resistance in plants. There is evidence to suggest that the TIR-NBS-LRR gene (RCT1) from M. truncatula confers resistance to anthracnose in transgenic alfalfa plants (Yang et al., 2008). The identification of markers linked to the TIR-NBS-LRR gene in the present study suggests their contribution to VW resistance.
We identified an ABC transporter linked to the significant marker, S7_131660043 on chromosome 7 in the present analysis. The ABC transporters are known to participate in many processes including polar auxin transport, alkaloid import, tissue pigmentation, vacuolar xenobiotic sequestration, stomatal regulation, disease resistance, lipid catabolism, antibiotic resistance, assembly of redox-active cytosolic Fe/S proteins, and heavy metal tolerance (Rea, 2007). One of the mechanisms of plant-pathogen reaction is the reduction of accumulation of toxic compounds at the target site due to secretion by ABC transporters that can transport a wide variety of natural products, including plant antimicrobials (Stergiopoulos et al., 2002; Roohparvar et al., 2007). Tic22 is a chaperone that is essential for protein translocation into the apoplast (Glaser et al., 2012). VITP is involved in the transfer of iron from the cytosol to the vacuole for intracellular iron storage (Kim et al., 2006). CAP160 was identified as a cold and desiccation responsive protein. CAP160’s function may be involved in participating in the assembly of ribosomes or to help accelerate translation (Kaye et al., 1998). Interestingly, one of the significant marker loci identified by the UNEAK pipeline linked to a trehalose-6-phosphate synthase, the first enzyme of the trehalose synthesis pathway. It has been reported that trehalose accumulation in rice plants confers high tolerance levels to different abiotic stresses (Garg et al., 2002). It has been also suggested that trehalose-6-phosphate synthase regulates primary and secondary metabolism during infection by the rice blast fungus (Fernandez and Wilson, 2011).
Polygenic Inheritance of VW Resistance in Alfalfa
The genetic basis of alfalfa resistance to VW has been investigated by several research groups. Viands (1985) suggested that complex additive effects contribute to VW resistance in alfalfa cultivar ‘Vertus’. Miller and Christie (1991) reported that general combining ability (GCA) significantly influences the expression of resistance to VW in ‘Vertus’ alfalfa and suggested that additive genetic variance is the primary source of VW resistance. Our previous reports (Zhang et al., 2014; Yu et al., 2016) identified multiple loci associated with VW resistance in two alfalfa mapping populations. This assumption is supported by the present study where multiple loci are shown to contribute to VW resistance in F1 population of alfalfa segregated for VW resistance.
QTLs for VW resistance have been reported in M. truncatula. They include a major QTL on chromosome 7, and two minor QTLs on chromosomes 2 and 6 (Ben et al., 2013). In our previous study, three resistance loci associated with VW resistance were identified on chromosome 7 and one on chromosome 2 (Zhang et al., 2014). Our recent report identified multiple loci associated with VW resistance on chromosomes 5, 6, 7, and 8 (Yu et al., 2016). In the present study, most significant resistance loci have been identified at similar locations on chromosomes 5, 6, and 7, suggesting common pathways for VW resistance in M. sativa and M. truncatula. However, additional resistance loci to VW have been identified in M. sativa by both our previous (Zhang et al., 2014; Yu et al., 2016) and the present studies, whereas these loci were not found in M. truncatula. For instance, several VW resistance loci were identified on chromosomes 5 and 8 by our previous (Zhang et al., 2014; Yu et al., 2016) and the present studies, whereas no VW resistance locus has been reported on chromosomes 5 and 8 in M. truncatula.
We used three GBS pipelines for SNP discovery and identification of marker loci associated with VW resistance in the F1 alfalfa population using genotyping-by-sequencing and genome-wide association approaches. This is the first effort to compare the effect of pipelines on SNP discovery in alfalfa. Although different markers were identified by different pipelines, most significant resistance loci located on chromosome 6 have been identified by all pipelines, confirming the consistency of SNP discovery and marker identification by GBS pipelines used in the present study. Sequence alignment to the M. truncatula genome revealed multiple resistance loci, several of which are linked to known resistance genes. The putative resistance loci thus identified had similar chromosomal locations to those previously reported in tetraploid alfalfa (Zhang et al., 2014; Yu et al., 2016) and in its diploid relative, M. truncatula (Ben et al., 2013). With further validation, those markers closely linked to VW resistance can be used for MAS to accelerate the development of new alfalfa cultivars with improved resistance to VW.
Conceived and designed the experiments: L-XY. Performed the experiments: SB, X-PL. Analyzed the data: PZ and DM. Wrote the paper: L-XY and PZ.
This work was supported by USDA-ARS National Program Project No. 5354-21000-015-00D, USDA NIFA NRSP10, Washington State University and by a partial grant from the National Alfalfa and Forage Alliance.
Conflict of Interest Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
We thank Dr. Charles Hawkins for his critical reading, editing and suggestions regarding the manuscript.
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/article/10.3389/fpls.2017.00089/full#supplementary-material
- ^ https://www.naaic.org/
- ^ http://www.ibm.com/software/analytics/spss/
- ^ http://picard.sourceforge.net/
- ^ http://ape-package.ird.fr/pegas.html
- ^ http://www.maizegenetics.net/\#!tassel/c17q9
- ^ http://www.ncbi.nlm.nih.gov
- ^ http://phytozome.jgi.doe.gov/jbrowse/index.html?data=genomes\%2FMtruncatula\&loc
- ^ http://www.phytozome.net/
Annicchiarico, P., Nazzicari, N., Ananta, A., Carelli, M., Wei, Y., and Brummer, E. C. (2016). Assessment of cultivar distinctness in alfalfa: a comparison of genotyping-by-sequencing, simple-sequence repeat marker, and morphophysiological observations. Plant Genome 9. doi: 10.3835/plantgenome2015.10.0105
Annicchiarico, P., Nazzicari, N., Li, X., Wei, Y., Pecetti, L., and Brummer, E. C. (2015). Accuracy of genomic selection for alfalfa biomass yield in different reference populations. BMC Genomics 16:1020. doi: 10.1186/s12864-015-2212-y
Ben, C., Toueni, M., Montanari, S., Tardin, M. C., Fervel, M., Negahi, A., et al. (2013). Natural diversity in the model legume Medicago truncatula allows identifying distinct genetic mechanisms conferring partial resistance to Verticillium wilt. J. Exp. Bot. 64, 317–332. doi: 10.1093/jxb/ers337
Bradbury, P., Zhang, Z., Kroon, D., Casstevens, T., Ramdoss, Y., and Buckler, E. S. (2007). Tassel: software for association mapping of complex traits in diverse samples. Bioinformatics 23, 2633–2635. doi: 10.1093/bioinformatics/btm308
Cao, H., Glazebrook, J., Clarke, J. D., Volko, S., and Dong, X. (1997). The Arabidopsis NPR1 gene that controls systemic acquired resistance encodes a novel protein containing ankyrin repeats. Cell 88, 57–63. doi: 10.1016/S0092-8674(00)81858-9
Catchen, J. M., Amores, A., Hohenlohe, P., Cresko, M., and Postlethwait, J. H. (2011). Stacks: building and genotyping loci de novo from short-read sequences. G3 1, 171–182. doi: 10.1534/g3.111.000240
Clevenger, J., Chavarro, C., Pearl, S. A., Ozias-Akins, P., and Jackson, S. A. (2015). Single nucleotide polymorphism identification in polyploids: a review, example, and recommendations. Mol. Plant 8, 831–846. doi: 10.1016/j.molp.2015.02.002
Elshire, R. J., Glaubitz, J. C., Sun, Q., Poland, J. A., Kawamoto, K., Buckler, E. S., et al. (2011). A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE 6:e19379. doi: 10.1371/journal.pone.0019379
Fernandez, J., and Wilson, R. A. (2011). the sugar sensor, trehalose-6-phosphate synthase (tps1), regulates primary and secondary metabolism during infection by the rice blast fungus: will magnaporthe oryzae’s “sweet tooth” become its “achilles’ heel”? Mycology 2, 46–53. doi: 10.1080/21501203.2011.563431
Garg, A. K., Kim, J.-K., Owens, T. G., Ranwala, A. P., Choi, Y. D., Kochian, L. V., et al. (2002). Trehalose accumulation in rice plants confers high tolerance levels to different abiotic stresses. Proc. Natl. Acad. Sci. U.S.A. 99, 15898–15903. doi: 10.1073/pnas.252637799
Garrison, E., and Marth, G. (2012). Haplotype-Based Variant Detection from Short-Read Sequencing. Available at: http://arxiv.org/abs/1207.3907
Glaser, S., van Dooren, G., Agrawal, S., Brooks, C. F., McFadden, G. I., Striepen, B., et al. (2012). Tic22 is an essential chaperone required for protein import into the apicoplast. J. Biol. Chem. 287, 39505–39512. doi: 10.1074/jbc.M112.405100
Glaubitz, J. C., Casstevens, T. M., Lu, F., Harriman, J., Elshire, R. J., Sun, Q., et al. (2014). TASSEL-GBS: a high capacity genotyping by sequencing analysis pipeline. PLoS ONE 9:e90346. doi: 10.1371/journal.pone.0090346
Inderbitzin, P., Bostock, R. M., Davis, R. M., Usami, T., Platt, H. W., and Subbarao, K. V. (2011). Phylogenetics and taxonomy of the fungal vascular wilt pathogen Verticillium, with the descriptions of five new species. PLoS ONE 6:e28341. doi: 10.1371/journal.pone.0028341
Julier, B., Flajoulot, S., Barre, P., Cardinet, G., Santoni, S., Huguet, T., et al. (2003). Construction of two genetic linkage maps in cultivated tetraploid alfalfa (Medicago sativa) using microsatellite and AFLP markers. BMC Plant Biol. 3:9. doi: 10.1186/1471-2229-3-9
Kaye, C., Neven, L., Hofig, A., Li, Q. B., Haskell, D., and Guy, C. (1998). Characterization of a gene for spinach CAP160 and expression of two spinach cold-acclimation proteins in tobacco. Plant Physiol. 116, 1367–1377. doi: 10.1104/pp.116.4.1367
Kim, S. A., Punshon, T., Lanzirotti, A., Li, L., Alonso, J. M., Ecker, J. R., et al. (2006). Localization of iron in Arabidopsis seed requires the vacuolar membrane transporter VIT1. Science 314, 1295–1298. doi: 10.1126/science.1132563
Li, H. (2014). Towards Better Understanding of Artifacts in Variant Calling from High-coverage Samples, arXiv:1404.0929. Available at. http://arxiv.org/abs/1404.0929
Li, X., Wei, Y., Acharya, A., Jiang, Q., Kang, J., and Brummer, E. C. (2014). A saturated genetic linkage map of autotetraploid alfalfa (Medicago sativa L.) developed using genotyping-by-sequencing is highly syntenous with the Medicago truncatula genome. G3 4, 1971–1979. doi: 10.1534/g3.114.012245
Lu, F., Lipka, A. E., Glaubitz, J., Elshire, R., Cherney, J. H., Casler, M. D., et al. (2013). Switchgrass genomic diversity, ploidy, and evolution: novel insights from a network-based SNP discovery protocol. PLoS Genet. 9:e1003215. doi: 10.1371/journal.pgen.1003215
McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., et al. (2010). The genome analysis toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303. doi: 10.1101/gr.107524.110
O’Rawe, J., Jiang, T., Sun, G., Wu, Y., Wang, W., Hu, J., et al. (2013). Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing. Genome Med. 5:28. doi: 10.1186/gm432
Poland, J. A., Brown, P. J., Sorrells, M. E., and Jannink, J.-L. (2012). Development of high-density genetic maps for barley and wheat using a novel two-enzyme genotyping-by-sequencing approach. PLoS ONE 7:e32253. doi: 10.1371/journal.pone.0032253
Roohparvar, R., Huser, A., Zwiers, L.-H., and De Waard, M. A. (2007). Control of mycosphaerella graminicola on wheat seedlings by medical drugs known to modulate the activity of ATP-binding cassette transporters. Appl. Environ. Microbiol. 73, 5011–5019. doi: 10.1128/AEM.00285-07
Stergiopoulos, I., Zwiers, L.-H., and De Waard, M. A. (2002). Secretion of natural and synthetic toxic compounds from filamentous fungi by membrane transporters of the ATP-binding cassette and major facilitator superfamily. Eur. J. Plant Pathol. 108, 719–734. doi: 10.1023/A:1020604716500
Uitdewilligen, J. G., Wolters, A. M., D’hoop, B. B., Borm, T. J., Visser, R. G., and van Eck, H. J. (2013). A next-generation sequencing method for genotyping-by-sequencing of highly heterozygous autotetraploid potato. PLoS ONE 8:e62355. doi: 10.1371/journal.pone.0062355
Yang, S., Gao, M., Xu, C., Gao, J., Deshpande, S., Lin, S., et al. (2008). Alfalfa benefits from Medicago truncatula: the RCT1 gene from M. truncatula confers broad-spectrum resistance to anthracnose in alfalfa. Proc. Natl. Acad. Sci. U.S.A. 105, 12164–12169. doi: 10.1073/pnas.0802518105
Yu, L.-X., Zheng, P., Zhang, T., Rodringuez, J., and Main, D. (2016). Genotyping-by-sequencing based genome-wide association studies on Verticillium wilt resistance in autotetraploid alfalfa (Medicago sativa L.). Mol. Plant Pathol doi: 10.1111/mpp.12389 [Epub ahead of print].
Zhang, T., Yu, L.-X., McCord, P., Miller, D., Bhamidimarri, S., Johnson, D., et al. (2014). Identification of molecular markers associated with verticillium wilt resistance in alfalfa (Medicago sativa l.) using high-resolution melting. PLoS ONE 9:e115953. doi: 10.1371/journal.pone.0115953
Zhang, T., Yu, L.-X., Zheng, P., Li, Y., Rivera, M., Main, D., et al. (2015). Identification of loci associated with drought resistance traits in heterozygous autotetraploid alfalfa (Medicago sativa L.) using genome-wide association studies with genotyping by sequencing. PLoS ONE 10:e0138931. doi: 10.1371/journal.pone.0138931
Keywords: GWAS, GBS bioinformatics, Verticillium wilt, alfalfa
Citation: Yu L-X, Zheng P, Bhamidimarri S, Liu X -P and Main D (2017) The Impact of Genotyping-by-Sequencing Pipelines on SNP Discovery and Identification of Markers Associated with Verticillium Wilt Resistance in Autotetraploid Alfalfa (Medicago sativa L.). Front. Plant Sci. 8:89. doi: 10.3389/fpls.2017.00089
Received: 20 September 2016; Accepted: 16 January 2017;
Published: 07 February 2017.
Edited by:Xiaowu Wang, Biotechnology Research Institute (CAAS), China
Reviewed by:Bernadette Julier, Institut National de la Recherche Agronomique, France
Shichen Wang, Texas A&M University, USA
Copyright © 2017 Yu, Zheng, Bhamidimarri, Liu and Main. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Long-Xi Yu, firstname.lastname@example.org
†These authors have contributed equally to this work.