Association Studies and Legume Synteny Reveal Haplotypes Determining Seed Size in Vigna unguiculata

Highly specific seed market classes for cowpea and other grain legumes exist because grain is most commonly cooked and consumed whole. Size, shape, color, and texture are critical features of these market classes and breeders target development of cultivars for market acceptance. Resistance to biotic and abiotic stresses that are absent from elite breeding material are often introgressed through crosses to landraces or wild relatives. When crosses are made between parents with different grain quality characteristics, recovery of progeny with acceptable or enhanced grain quality is problematic. Thus genetic markers for grain quality traits can help in pyramiding genes needed for specific market classes. Allelic variation dictating the inheritance of seed size can be tagged and used to assist the selection of large seeded lines. In this work we applied 1,536-plex SNP genotyping and knowledge of legume synteny to characterize regions of the cowpea genome associated with seed size. These marker-trait associations will enable breeders to use marker-based selection approaches to increase the frequency of progeny with large seed. For 804 individuals derived from eight bi-parental populations, QTL analysis was used to identify markers linked to 10 trait determinants. In addition, the population structure of 171 samples from the USDA core collection was identified and incorporated into a genome-wide association study which supported more than half of the trait-associated regions important in the bi-parental populations. Seven of the total 10 QTLs were supported based on synteny to seed size associated regions identified in the related legume soybean. In addition to delivering markers linked to major trait determinants in the context of modern breeding, we provide an analysis of the diversity of the USDA core collection of cowpea to identify genepools, migrants, admixture, and duplicates.

Highly specific seed market classes for cowpea and other grain legumes exist because grain is most commonly cooked and consumed whole. Size, shape, color, and texture are critical features of these market classes and breeders target development of cultivars for market acceptance. Resistance to biotic and abiotic stresses that are absent from elite breeding material are often introgressed through crosses to landraces or wild relatives. When crosses are made between parents with different grain quality characteristics, recovery of progeny with acceptable or enhanced grain quality is problematic.Thus genetic markers for grain quality traits can help in pyramiding genes needed for specific market classes. Allelic variation dictating the inheritance of seed size can be tagged and used to assist the selection of large seeded lines. In this work we applied 1,536-plex SNP genotyping and knowledge of legume synteny to characterize regions of the cowpea genome associated with seed size. These marker-trait associations will enable breeders to use marker-based selection approaches to increase the frequency of progeny with large seed. For 804 individuals derived from eight bi-parental populations, QTL analysis was used to identify markers linked to 10 trait determinants. In addition, the population structure of 171 samples from the USDA core collection was identified and incorporated into a genomewide association study which supported more than half of the trait-associated regions important in the bi-parental populations. Seven of the total 10 QTLs were supported based on synteny to seed size associated regions identified in the related legume soybean. In addition to delivering markers linked to major trait determinants in the context of modern breeding, we provide an analysis of the diversity of the USDA core collection of cowpea

INTRODUCTION
Cowpea is a warm-season legume grown throughout the tropics and several areas of the subtropics. West African countries led by Nigeria and Niger produce 70% of the world's crop on 10 million ha (FAOSTAT, 2013). The North-Eastern region of Brazil is the second largest region of production, followed by Eastern and Southern Africa, South Asia, and North America. As is the case for other grain legumes, farmers' and market acceptance of cowpea are driven by the visual appearance of the grain. In most markets, large seed size is desirable and this is reflected in price premiums for large cowpea grain. Across Africa a diversity of grain sizes and colors exist which have varying importance in local or regional contexts. In West Africa the two most important grain types are large white or brown with rough seed coat texture, while in East and Southern regions of Africa relatively smaller seeds with smooth texture and brown to red color predominate in markets. In the Western United States, Southern Europe, and the Middle-East the "blackeyed pea" cowpea predominates. This type of cowpea is characterized by a large grain and white seed coat with a pigmented "eye" around the hilum. Figure 1 displays a diversity of cowpea seed types. "Fresh-shell" varieties are also desired which are harvested before maturity for their large seed that can be easily removed from green pods. Consumer preference primarily demands large seed when grown for grain; however, small seed is preferred when seed is sold by volume for use as a fodder or cover crop.
Seed size has several agronomically important impacts. Large seeded cowpea have enhanced emergence when planted deep (up to 5 cm), tend to emerge earlier, and produce larger plants during early development (Lush and Wien, 1980). In contrast, while large seeds typically have advantages over small seeded competitors (Wulff, 1986), small seeds are desirable for early drought conditions because they are able to transpire less water relative to their ability to reach water supplies (Hendrix et al., 1992). This may be particularly important for semi-arid rainfed growing regions.
Seed size is a very stable component of grain yield with high heritability for many crop plants including wheat (Giura and Saulescu, 1996), soybean (Cober et al., 1997), cowpea (Drabo et al., 1984), and mung bean (Fery, 1980). Several genes are known to impact the inheritance of seed size in cowpea. Drabo et al. (1984) proposed that at least eight loci contribute to the quantitative inheritance of seed size and Fatokun et al. (1992) identified two major, unlinked genomic regions, one of which is orthologous to a seed size QTL in mung bean. The orthology of this locus was later confirmed by its identification and association to seed size in soybean (Maughan et al., 1996). Exploration of legume synteny for cowpea trait characterization continues to be a rewarding approach that has also been used to better describe resistance to fungal pathogens (Muchero et al., 2011;Pottorff et al., 2012a), tolerance to heat during reproductive development (Lucas et al., 2013a), and leaf morphology (Pottorff et al., 2012b).
The introgression of novel traits from diverse collections typically compromises seed size among progeny. Because of the importance of grain size in market appeal, recovery of adequate grain size is an important objective following elite × exotic crosses. Wide crosses are commonly pursued to help deliver new varieties with enhanced resistance to biotic and abiotic stress. Several cycles of backcrossing help recover elite characteristics including seed size; however, this process can be cumbersome and inefficient due to possible linkage drag and the polygenic nature of the trait. To help improve the selection of desirable lines we developed associations between genic SNP markers and seed size using experimental populations, a diversity collection, and knowledge of legume synteny.

PHENOTYPE DATA
Seed size was calculated as weight per 100 seed. The seeds we measured were harvested from plants grown under favorable conditions whether in the field or in the greenhouse. This means that plants were well watered and treated with pesticides as needed. The populations which were used are presented in Table 1 and are among those used to develop the consensus genetic map of cowpea (Lucas et al., 2011). All populations were at least at the F 8 , except the IT84S-2246 × Mouride population which was phenotyped and genotyped at the F 4 generation. All eight populations were grown in the greenhouse, while the CB27 × IT82E-18 and CB46 × IT93K-503 populations were also grown in field trials. In all greenhouse and field trials mature pods were harvested and dried for storage (<15% moisture). Seeds were subsequently cleaned from the pods, counted, and weighed to determine the weight of a random sample of 100 seeds. Seed size data provided online by Germplasm Resources Information Network (USDA-ARS and National Genetic Resources Program, 2013) was used for genome-wide association mapping.

GENOTYPE DATA
The 1,536-plex EST-derived SNP genotype data used to build the consensus genetic map of cowpea (Lucas et al., 2011) was also used to perform QTL analyses of the eight bi-parental populations. Genotype data for 171 individuals of the USDA core collection of cowpea (USDA Core) (Gillaspie et al., 1996) were also obtained using the 1,536-plex genotyping platform developed by our group (Muchero et al., 2009). SNP calls were exported for further processing from the Illumina GenomeStudio software (Illumina, 2010). Rogue individuals among the bi-parental populations which were described in Lucas et al. (2013b) were removed prior to QTL analysis. Similarly, genotype data for the USDA Core were used to identify and remove duplicate individuals using ParentChecker (Hu et al., 2012). ParentChecker was also helpful for formatting files for downstream analyses. SNPs were filtered on the basis of minor allele frequency (>0.20 for QTL and >0.10 for GWAS and analysis of population structure) to develop a set of polymorphic markers appropriate for analyses. The genotype data for the USDA core are provided in the Section "USDA Core Genotypes" in Supplementary Material.

MARKER-TRAIT ASSOCIATIONS
QTL IciMapping (Li et al., 2008) was used to perform inclusive composite interval mapping for seed size based on 100 seed weight data from eight bi-parental mapping populations. In a method similar to Lucas et al. (2013a), the genetic map Frontiers in Plant Science | Plant Genetics and Genomics used for QTL analyses was a composition of population specific map marker orders and distances, and consensus linkage groups assignments. Regions of the genome contributing major QTL were identified after considering (1) regions with LOD scores >3.0; (2) effect size >15% of phenotypic variance explained; (3) marker density; (4) span of the trait-associated region; (5) discovery in multiple populations or via GWAS; (6) haplotype consistency when QTL were discovered in multiple populations; and (7) homology with trait-associated regions in soybean. The potential effect of stacking favorable alleles for multiple QTL was also investigated by grouping lines based on their QTL composition. This was done for populations in which multiple QTLs were discovered. Individuals with no, one, or several favorable alleles underlying the seed size QTLs we report were grouped and the average seed size was determined for that group and compared to the population average. A single factor analysis of variance was performed to determine if differences in seed size were due to QTL content. The ICIM-EPI function within QTL IciMapping (Li et al., 2008) was used to search for QTL interactions.
Six-hundred and sixty-five EST-derived SNP markers with minor allele frequency >0.10 that were located among unique bins (one marker per bin) of the cowpea consensus genetic map were used to identify population structure of the subset of the USDA core. STRUCTURE (Pritchard et al., 2000) was used with BURNIN = 10,000 and NUMREPS = 50,000, with five runs of K = 1-15. The Evanno method (Evanno et al., 2005) facilitated by STRUCTURE HARVESTER (Earl and vonHoldt, 2012) was used in addition to CLUMPP (Jakobsson and Rosenberg, 2007) and DISTRUCT (Rosenberg, 2004) to reconcile genepools on the basis of geographic collection information provided online by GRIN (USDA-ARS and National Genetic Resources Program, 2013). Whole genome ancestry estimates (Q-matrix) computed from multiple STRUCTURE runs by CLUMPP were used as a covariate in the generalized linear model of association mapping provided by TASSEL 3.0 (Bradbury et al., 2007). Markers that showed −Log(P-Values) >3.0 that were also identified using the bi-parental populations were considered significantly associated.

SYNTENY ANALYSIS
Regions of the soybean genome syntenic with the cowpea seed size QTLs reported here were searched for seed size QTLs. HarvEST: cowpea ) was used to identify synteny based on BLASTX scores (<10 −10 ) between cowpea unigenes containing mapped SNPs and translated gene models from soybean (Schmutz et al., 2010). Soybean genomic locations homeologous to cowpea seed size QTL were reconciled with an abundance of soybean seed size QTL inventoried and integrated with the physical genome by SoyBase (Grant et al., 2010). Only soybean QTL that were within or tightly linked to the syntenic region (<3 million base pairs) were considered orthologous.

FIELD AND GREENHOUSE TRIALS
The two field trials using the CB27 × IT82E-18 population produced seed size data that were strongly correlated to each other (Pearson's r = 0.83) and similar to that of the greenhouse trial (r = 0.59 and 0.63 for 2010 and 2011 respectively). This was also the case for the multiple trials of the CB46 × IT93K-503 population where field trials were correlated to each other (r = 0.30) and more so to the greenhouse trial (r = 0.52 and 0.31 for ISRA and INERA trials respectively). Figure 2 provides the phenotypic distribution of seed size among all eight bi-parental populations. The smallest seeded line had a 100 seed weight of 3.26 g and was produced by the parents Dan Ila and TVu-7778. The largest seeded line was produced by CB46 and IT93K-503 and had a 100 seed weight of 34.06 g. The average seed of an individual from the eight bi-parental populations had a 100 seed weight of 15.50 g. Phenotypic distributions of seed size for each trial are provided in Section "Population Seed Sizes" in Supplementary Material. Seed sizes for the parents of the mapping populations are provided in Section "Parent Seed Sizes" in Supplementary Material which ranged from 11.60 to 26.41 g per 100 seed. Phenotypic and genotypic characteristics of the mapping populations are provided in Table 1.

ASSOCIATION STUDIES
Ten QTL for seed size, representing ∼10% of the mapped cowpea genome were identified among the eight bi-parental populations www.frontiersin.org  ( Table 2). Most had narrow spans (<5 cM), accounted for a substantial proportion of the phenotypic variance (average of 30%), and were associated with multiple SNP markers (average LOD > 8.5). LOD score traces for each QTL discovery are included in the Supplementary Material ("RIL QTL LOD"). Haplotypes associated with large and small seed were consistent among discovery populations when QTL were detected in multiple populations ("Alleles" in Supplementary Material). This is the situation for Css -1 where markers 1_0974 and 1_0078 were detected among different experiments. Allelic variation important for seed size can be found among all parents of the bi-parental populations except for Dan Ila and IT84S-2049. The additive allelic effect of Css -1 was similar (1.77 and 2.18 g) between multiple trials of the CB27 × IT82E-18 population. This is also true for the multitrial detection of Css -2 using the CB46 × IT93K-503 population (1.97 g for both experiments). "Multi-QTL Effect" in Supplementary Material displays the potential for genetic gain by combining favorable alleles for multiple QTLs. QTL content has the most significant effect on seed size for the CB27 × IT82E-18 population F (3, 149) = 28.51, p = 1.25E-14, η 2 = 0.36). This is also true for all other populations except for the CB46 × IT93K-503 population where groups based on QTL content are mainly different due to chance F (4, 86) = 1.29, p = 0.28, η 2 = 0.06. See Section"Multi-QTL Effect"in Supplementary Material for test statistics for all populations. No significant QTL interactions were found among the discovery experiments.

Frontiers in Plant Science | Plant Genetics and Genomics
Six-hundred and sixty-five SNPs which were polymorphic among 171 accessions of the USDA core were used to identify 27 duplicated accessions ("USDA Core Duplicates" in Supplementary Material). An additional 10 accessions were excluded from further analysis due to a lack of geographic collection information. This filter yielded 134 accessions appropriate for population structure and association analyses. Geographic collection information and the Evanno method ("Evanno Method" in Supplementary Material) supported four subpopulations which accounted for a substantial proportion of population structure underlying the USDA core (Figure 3). Genepool 1 was the most prolific and was comprised of a majority of the samples collected in Eastern and Southern Africa. Samples collected in Asia were categorized primarily in genepool 2, and West Africa and Turkey were identified as genepool 3 and genepool 4, respectively. The genomes of 46 samples were primarily derived from genepool 1 and up to 87 samples contained a substantial proportion originating from genepool 1 ("Merged Q-Matrix" in Supplementary Material). While only 8 samples could be attributed entirely to genepool 3, 43 samples were admixed with genepool 3. Samples collected in South America were almost always an admixture of genepools 1 and 3. Most of the migrants were collected in West Africa and Asia.
Thirty-six SNP loci used in the GWAS of the USDA core surpassed −Log(P-value) thresholds and confirmed six of the ten QTL proposed by the bi-parental populations (Figure 4). Table 2 and is more comprehensively provided in Section "GLM" in Supplementary Material.

SYNTENY
Based on the syntenic relationships described by Lucas et al. (2011), 7 out of the 10 QTL identified in the bi-parental populations were supported by knowledge of seed size in soybean (Tables 2 and 3). A total of 19 associations between markers and seed size developed in soybean (Orf et al., 1999;Csanadi et al., 2001;Specht et al., 2001;Hoeck et al., 2003;Hyten et al., 2004;Zhang et al., 2004;Panthee et al., 2005;Reinprecht et al., 2006;Chen et al., 2007;Gai et al., 2007) were in regions homoeologs to the cowpea seed size QTL reported here. More details of the synteny analysis are presented in Section "Synteny Analysis" in Supplementary Material.

DISCUSSION
Cowpea with specific seed size can be predicted on the basis of marker-trait associations. These associations provide a foundation for marker-assisted breeding and can be developed through QTL analysis and association mapping which couple phenotypes, genotypes, and a genetic map (Figure 5). DNA markers tagging allelic variation underlying seed size QTL can be used to track trait determinants among breeding cycles. This approach facilitates the simultaneous improvement of a variety for different traits of interest.
From the standpoint of breeding, the most applicable association studies assess broad pedigrees and tag associated genomic regions with dense markers. Marker-trait associations identified in one population may not segregate or contribute to the inheritance of the trait in a different population. To support new marker-trait www.frontiersin.org FIGURE 4 | Genome-wide association analysis of seed size using the USDA core collection of cowpea. Loci surpassing significance thresholds that were also associated with seed size among the bi-parental populations are boxed. associations we used multiple populations, two popular methodologies (QTL and GWAS), and knowledge of seed size in soybean. The intent of this study was to assess allelic variation important for the inheritance of seed size in cowpea primarily in the context of marker-assisted breeding and comparative genomics. The associations developed in this study would be best validated after years of using them in breeding; however, we feel this work provides an important framework for future breeding initiatives and explores the potential of genomics to help deliver new varieties of cowpea. The accuracy of these marker-trait associations could be assessed by comparing the estimated additive allelic effects reported here with realized gains after using these markers for selection. Based on our analyses there is a large potential to produce larger-seeded lines by combining favorable alleles for multiple QTLs. However, our analysis of this potential is limited because our study lacked recombinants for all possible QTL combinations. A more complete view could be provided by studying the behavior of QTLs outside of their discovery pedigree. This could be accomplished by pursuing a mating scheme which used lines from different pedigrees and with different QTL content.
Breeders interested in using marker-trait associations would benefit from knowledge of linkage between trait determinants. The locations of the QTLs reported here are mainly unlinked or distantly linked to other traits characterized using the consensus genetic map of cowpea, including heat tolerance during reproductive development (Lucas et al., 2013a), leaf morphology (Pottorff et al., 2012b), and resistance to Fusarium oxysporum f.sp. tracheiphilum race 3 (Pottorff et al., 2012a). One (Thr -1) of the three QTL known to impact resistance to feeding damage caused by foliar thrips (Thrips tabaci and Frankliniella schultzei)  overlaps with a seed size QTL (Css -3) reported in the current work. The markers within this overlapping region include 1_0164 and 1_0589 where genotypes homozygous for AA at these were associated with large seed and thrips resistance. This means it would require a rare recombination to break the linkage between resistance to foliar thrips conferred by Thr -1 and a small seed conferred by Css -3. Other overlaps can be found between seed size QTL and regions associated with Macrophomina phaseolina resistance (Muchero et al., 2011) (Css -9 with Mac -6, and Css -4 with Mac -8). Therefore using markers linked to these overlapping regions may simultaneously affect seed size and resistance to Macrophomina. In such regions a higher density of markers would be useful for marker-assisted breeding.
SoyBase (Grant et al., 2010) is an excellent resource for legume researchers. The integration of QTL studies with the physical map made it possible for us to survey commonalities among association studies performed in plants of different genera. Such knowledge may provide paths for mechanistic studies aiming to pinpoint trait determinants. From the standpoint of this study, the colocalization of seed size QTL in soybean and cowpea provides a level of validation for new marker-trait associations. Knowledge of legume synteny and trait determinants would be enhanced by developing resources similar in density to the soybean community for other legumes (i.e., common bean, cowpea, mung bean, peanut, chickpea, etc.). An agricultural project that would be coordinated among groups with expertise in different legumes could greatly enhance comparative resources and the efficiency of new initiatives.
The fact that our study uses approaches capable of clarifying the domestication history and dispersal of modern cowpea does not escape our attention; however, due to sample size we advocate a conservative interpretation of the diversity analysis using the USDA core as presented here. Rather than focusing on potential insight concerning cowpea domestication or proposing new marker-trait associations, we present the results of the genomewide association study only to provide a modest assessment of collection diversity and to help support QTL identified among the bi-parental populations. The International Institute of Tropical Agriculture maintains a diverse collection which has been previously characterized on the basis of geographic, agronomic, and botanical descriptors (Mahalakshmi et al., 2007), but no collection of cowpea has been viewed in light of dense genotype data. The financial costs required for genotyping the rest of the USDA core is inexpensive relative to the value of the insight that can be gained. From our analysis of a small subset (171 samples) of the entire USDA core (720 samples) we were able to identify many duplicated accessions (∼17%), overrepresentation of the South/East African genepool, and we were able to perform an association study which mainly agreed with the QTL studies stemming from the bi-parental recombinant-inbred populations. SNP data from the entire core collection could be used to improve the diversity collection and its impact on the cowpea community. Phenotypic data for a number of traits are available on GRIN and could be combined with genotype data, similar to this work, to facilitate the discovery of numerous marker-trait associations. The use of historical data would be a cost effective approach to improve knowledge of cowpea genetic diversity and allelic variation contributing to the inheritance of agronomically important traits. The feasibility of this approach was recently supported within the barley community (Wang et al., 2011). That work helped demonstrate the utility of historical data after careful consideration of population size and experimental design. Furthermore, a comparative analysis of the diversity among core collections (i.e., IITA, USDA, and UCR) www.frontiersin.org would be valuable for identifying instances of ascertainment bias and duplicated accessions possibly known by different names. This is a documented issue for U.S. collections (Vigna Crop Germplasm Committee, 1996), and continued application of genotype data to identify duplicates would be particularly helpful in cutting costs associated with the maintenance of collections and for designing new experiments.

ACKNOWLEDGMENTS
This work was supported by the CGIAR Generation Challenge Program through a grant from the Bill and Melinda Gates Foundation, U.S. Agency for International Development Collaborative Research Support Program Grants GDG-G-00-02-00012-00 and EDH-A-00-07-00005, by the United States Department of Agriculture Project Number: 6607-21000-010-20, and by the Genetics, Genomics, and Bioinformatics program at UC Riverside.