Genetic Mapping by Integration of 55K SNP Array and KASP Markers Reveals Candidate Genes for Important Agronomic Traits in Hexaploid Wheat

Agronomic traits such as heading date (HD), plant height (PH), thousand grain weight (TGW), and spike length (SL) are important factors affecting wheat yield. In this study, we constructed a high-density genetic linkage map using the Wheat55K SNP Array to map quantitative trait loci (QTLs) for these traits in 207 recombinant inbred lines (RILs). A total of 37 QTLs were identified, including 9 QTLs for HD, 7 QTLs for PH, 12 QTLs for TGW, and 9 QTLs for SL, which explained 3.0–48.8% of the phenotypic variation. Kompetitive Allele Specific PCR (KASP) markers were developed based on sequencing data and used for validation of the stably detected QTLs on chromosomes 3A, 4B and 6A using 400 RILs. A QTL cluster on chromosome 4B for PH and TGW was delimited to a 0.8 Mb physical interval explaining 12.2–22.8% of the phenotypic variation. Gene annotations and analyses of SNP effects suggested that a gene encoding protein Photosynthesis Affected Mutant 68, which is essential for photosystem II assembly, is a candidate gene affecting PH and TGW. In addition, the QTL for HD on chromosome 3A was narrowed down to a 2.5 Mb interval, and a gene encoding an R3H domain-containing protein was speculated to be the causal gene influencing HD. The linked KASP markers developed in this study will be useful for marker-assisted selection in wheat breeding, and the candidate genes provide new insight into genetic study for those traits in wheat.


INTRODUCTION
Wheat (Triticum aestivum L.) is one of the most important cereal crops worldwide, providing a food source for 30% of the human population (Mayer et al., 2014). Improving the yield potential of wheat is of great significance for meeting the food demand from an increasing population (Tshikunde et al., 2019). Agronomic traits such as heading date (HD), plant height (PH), thousand grain weight (TGW), and spike length (SL) are important factors affecting yield and always targeted by wheat breeders (Tshikunde et al., 2019). Recent advances in wheat genomics have accelerated the genetic dissection of important agronomic traits, and a large number of quantitative trait loci (QTLs) for these traits have been identified (Rasheed and Xia, 2019).
Heading date is crucial for adaptation to different environments and yield stability in wheat (Snape et al., 2001). Over a hundred QTLs for HD located across all wheat chromosomes have been detected (Milec et al., 2014;Kiseleva and Salina, 2018). The cloned genes affecting HD or flowering in wheat are mainly classified into three groups: vernalization (VRN), photoperiod (Ppd), and earliness per se (Eps) genes (Snape et al., 2001). Four VRN genes (VRN1, VRN2, VRN3, and VRN4) located on chromosome 5 or 7 of the A/B/D genomes, have been identified by map-based cloning (Yan et al., 2003(Yan et al., , 2004(Yan et al., , 2006Kippes et al., 2015;Xie et al., 2019). The Ppd genes for photoperiod responses in wheat are mainly located on chromosomes 2A, 2B, and 2D (Beales et al., 2007). The Eps genes were identified on chromosome 1A m in Triticum monococcum (Alvarez et al., 2016) and on long arm of chromosome 1D in hexaploid wheat (Zikhali et al., 2014).
To obtain the genetic basis for HD, PH, TGW, and SL, we conducted QTL mapping based on a RIL population in the present study. In our previous study we used Bulked Segregant Analysis (BSA) and identified VRN-B1 as the gene responsible for HD variation in the RIL population . In this study, we used the Wheat55K SNP Array to map QTLs for HD, PH, TGW, and SL in this RIL population. Moreover, we validated the major QTLs on chromosomes 3A, 4B, and 6A by developing Kompetitive Allele Specific PCR (KASP) markers based on sequencing data and predicted candidate genes for PH, TGW, and HD according to gene annotation and SNP effects analysis.

Plant Materials and Phenotype Evaluation
As previously described , a RIL population (400 lines) derived from a cross between an early heading mutant (eh1) and Lunxuan987 (LX987) was used for genetic mapping; generations F 6 to F 8 of the RIL population were included in this study. The RIL and parent lines were planted at the Zhongpuchang field station of the Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (Beijing, China) during the 2015-2016, 2016-2017, and 2017-2018 cropping seasons. For each year, the experiment was conducted once and we selected three representative plants for phenotypic collection. A total of 15 plants for each line were planted in a row of 1 m, and the field conditions were managed according to local standard practices.
For HD, when more than half of the spikes had emerged from two thirds of the plants in a line, the date for that line was recorded . At agronomic harvest maturity, three representative plants from the middle of each row showing uniform growth status were used for PH, TGW, and SL evaluation and the mean values from these three plants were used for QTL mapping. PH from each representative plant was measured from the ground to the tip of the spike excluding awns. After drying, the grain weight from each representative plant was measured and the number of grains was counted. TGW was calculated as the plant grain weight divided by the number of grains per plant multiplied by 1,000. SL from main stem of each representative plant was measured from the base of the rachis to the tip of the terminal spikelet excluding awns. The HD, PH, and TGW data were collected in 2016, 2017, and 2018, and the SL data were collected in 2016 and 2018. Analyses of variance, correlation coefficients, and broad sense heritability were performed using the ANOVA analysis tools of the QTL IciMapping v4.1 program 1 .

Genotyping
Genomic DNA of each RIL and parent line was extracted as previously described . After assessment of DNA integrity and quantity, the DNA from 207 lines that were also used for KASP assay, along with the parent DNA samples were hybridized to the Wheat55K SNP Array containing 53,063 markers. The genotyping was performed by China Golden Marker (Beijing) Biotech Co. Ltd 2 .

Genetic Map Construction and QTL Analysis
High quality genotyping data were obtained by filtering with a Dish QC threshold of >0.82 and a Call-Rate threshold of >95%. The BIN function of IciMapping 4.1 was used to remove redundant markers from poly-high-resolution (PHR) SNPs, and the SNPs with >25% missing data were filtered out. The genetic map was constructed by randomly selecting only one marker from each bin using the MAP function of IciMapping 4.1. The threshold of the logarithm of odds (LOD) score was set to 2.5, and the Kosambi map function was used to calculate the map distance from recombination frequencies. Composite interval mapping (ICIM) in IciMapping 4.1 was selected to identify QTLs for HD, PH, TGW, and SL. The mean values of phenotypic traits for each line in each cropping season were used for QTL analysis. QTL region was determined by the positions of left and right markers identified by IciMapping 4.1, and physical positions of markers on the wheat reference genome v1.0 are shown in Supplementary  Table 1. QTLs for the same traits identified in 2 or 3 years were considered to be stable. Multi-Environment Traits (MET) analysis of QTL IciMapping v4.1 was used for assessment of QTL × environment interactions (Li et al., 2015).

Development of KASP Markers and QTL Validation
According to the SNPs between eh1 and LX987 identified by RNA sequencing (RNA-seq) , KASP markers around or in the region of stable QTLs specific for different subgenomes were designed using the polyploid primer design pipeline PolyMarker 3 . After evaluation of the polymorphisms between two parent lines, the developed KASP markers were used for genotyping the entire mapping population. The successfully developed KASP markers are listed in Supplementary Table 2. A total of 400 RILs were genotyped with KASP markers on chromosomes 3A, 4B, and 6A. The reaction volume and PCR procedures for the KASP assay were as previously described , and the CFX 96 Real-Time System (Bio Rad, Hercules, CA, United States) was used for PCR and data analysis. QTL analysis was conducted using IciMapping 4.1.

Analysis of SNP Effects and Prediction of Candidate Genes
Based on RNA-seq data, which was collected from young spikes of eh1 and LX987 when eh1 was beginning to head , the SNPs between eh1 and LX987 covering the intervals of flanking markers from QTL validation were obtained for SNP effects analysis. SNP effects were analyzed by Python 4 according to the example and scripts from the website 5 . A score for missense variation is generated that reflects the predicted effect of the SNP on gene function. The more negative a score, the larger the effects on gene function. The SNPs with larger effects on gene function were speculated to be located in the candidate genes. Gene functions were predicted by searching for homologous genes in rice (Oryza sativa) and Arabidopsis thaliana using the Triticeae Multi-omics Center website 6 .

Phenotypic Variation in the RIL Population
Our previous study showed that there is variation in HD in a RIL population of 400 lines derived from a cross between the early heading mutant eh1 and LX987 . In addition to HD, we also found that PH, TGW, and SL differed between eh1 and LX987; the values of these traits were significantly lower in eh1 than in LX987 from 2016-2018 (Table 1). Therefore, phenotypic investigation of PH, TGW, and SL in the RIL population was also conducted from 2016-2018. In the RIL population, the percent variation in PH, TGW, and SL ranged from 9.1% to 14.4% from 2016-2018, and all three traits showed moderate h 2 values ranging from 0.77 to 0.82 (Table 1). Min and Max represent minimum and maximum of the corresponding traits among the RIL population. CV is coefficients of variation and h 2 is broad sense heritability.
Frontiers in Plant Science | www.frontiersin.org In addition, PH, TGW, and SL from 2016-2018 followed a normal distribution and strong transgressive segregation was observed in the RIL population (Figure 1). Analysis of variance of PH, TGW, and SL for the multiple environment trials in the RIL population indicated that these traits were affected by environmental conditions (Supplementary Table 3).
Analysis of the pairwise correlations between HD, PH, TGW, and SL suggested that TGW and SL were significantly positively correlated with PH while SL was significantly negatively correlated with HD (Table 2). However, the correlations between SL and PH, SL and HD were weak ( Table 2).

Genetic Map Construction
Among the 400 RILs, 207 lines were randomly selected for genotyping using a Wheat55K SNP Array with 53,063 tags selected from the Wheat660K SNP Array (Ren et al., 2018). Since PHR SNPs are recommended for polyploid species and have the highest reliability, only PHR SNP probes were kept. SNPs with the same genotype in both parents were removed. Finally, 6505 SNP markers were obtained for genetic map construction (Table 3). These markers were divided into 1097 unique loci with the number distributed on each chromosome ranging from 10 to 96 ( Table 3). The genetic  Three lines from each year were used, and the correlations are the average values of the three years. ns means no significant correlation detected by statistical analysis. *** and ** indicate significance at p ≤ 0.001 and p ≤ 0.01, respectively.
Four QTL clusters were identified on chromosomes 3A, 4B, 5B, and 6B (Table 5). For the QTL cluster on chromosome 3A, qHD3A.1 and qHD3A.2 were co-localized with qTGW3A and qSL3A in a region ranging from 70.28 cM to 88.01 cM. On chromosome 4B, qPH4B.1 for PH was clustered with two QTLs for TGW, with the alleles from LX987 increasing PH and TGW. For the QTL cluster on chromosome 5B, qHD5B, which was detected in all 3 years, was clustered with qSL5B.3 (Tables 4, 5); however, the positive alleles for these QTLs were derived from opposite parents ( Table 4). Three QTLs for PH on chromosome 6B were clustered with qHD6B and qSL6B, with the alleles from LX987 increasing PH and SL (Tables 4, 5).
In a recent study we reported that the VRN-B1 gene located on chromosome 5B around the qHD5B region was responsible for HD variation in the RIL population . For the validation of qHD3A, we successfully developed seven KASP markers around or in the 55K SNP array-mapped region, and delimited the QTL to a genetic interval of 1.29 cM between markers 3A128b and 3A16, spanning approximately 2.5 Mb ( Figure 3A). The LOD scores of this QTL were 5.7 and 7.5, explaining 6.0% and 8.0% of the variation of HD, in 2017 and 2018, respectively (    Table 6). For the validation of qSL6A, we successfully developed 10 KASP markers for genetic mapping. In 2016 and 2018, a major QTL for SL with LOD scores of 12.6 and 23.5 and explaining 13.1% and 22.5% of phenotypic variation, respectively, was detected between markers 6A51 and 6A419 within a genetic interval of 31.8 cM ( Figure 3C and Table 6). Due to the large   region for this QTL, we did not conduct further analysis of the candidate genes.

Gene Annotations and Effects of SNPS in the Validated QTL Regions on Chromosome 4B and 3A
Since qPH4B.1 and qTGW4B.1 were delimited to a physical interval of 0.8 Mb between markers 4B271b and 4B288b by genetic mapping with KASP markers, we analyzed gene models and annotations in this region according to the Chinese Spring (CS) reference genome v1.0 (International Wheat Genome Sequencing Consortium, 2018). In this region, seven high-confidence genes were annotated. Based on BLASTP searches for rice and Arabidopsis homologous genes 7 , these genes are predicted to encode 40S ribosomal protein S27 (TraesCS4B01G280800), Betagalactosidase (TraesCS4B01G280900), a Histidine-containing phosphotransfer protein (TraesCS4B01G281000), 60S ribosomal protein L5 (TraesCS4B01G281100), Protein PAM68 (TraesCS4B01G281200), Tribbles homolog 3 (TraesCS4B01G281300), and a Ubiquitin carboxylterminal hydrolase family protein (TraesCS4B01G281400) (Supplementary Table 5). We also analyzed sequence variation between LX987 and eh1 in this region based on RNA-seq data. A total of 18 SNPs with each parent homozygous for different alleles were identified (Supplementary Table 6). Analysis of SNP effects suggested that three SNPs were missense mutations. One SNP in TraesCS4B02G281200 located at the 189 th position caused a change in the amino acid Leu in LX987 to Trp in eh1 and was predicted to have the largest effect on gene function. Multiple alignment of amino acid sequences of protein PAM68 from grasses indicated that this region is conserved among Brachypodium distachyon, Sorghum bicolor, Zea mays and rice (Supplementary Figure 2).
For the HD QTL on chromosome 3A between markers 3A128b and 3A16, we found that 38 high-confidence genes were annotated in the mapped interval (Supplementary Table 7). In this region, nine homozygous SNPs with genotypes differing between the two parent lines were found based on RNA-seq 7 http://202.194.139.32/searchtools/ data. One SNP in TraesCS3A01G086400, which encodes an R3H domain-containing protein, that caused a change from Ser to Pro at the 267 th position had the largest effect on gene function (Supplementary Table 8).

DISCUSSION
QTL Mapping Using the WHEAT55K SNP Array SNP arrays are a powerful and effective approach for QTL mapping (Rasheed et al., 2017). The tags of the Wheat55K Array (Affymetrix R Axiom R Wheat55) were carefully selected from the Wheat660K Array, and all tags were uniformly distributed on 21 chromosomes. Therefore, the 55K Array is suitable for genotyping in QTL studies (Ren et al., 2018). The Wheat55K SNP Array has been utilized for QTL mapping of productive tiller number (Liu et al., 2018b), temporal expression of tiller number (Ren et al., 2018), and leaf rust and stripe rust resistance (Huang et al., 2019;Zhang et al., 2019) in wheat. In this study, we used the Wheat55K SNP Array to genotype 207 RILs and constructed a genetic map containing 6,505 PHR SNP markers ( Table 3). PHR SNPs are of high quality and possess better cluster resolution than other SNPs (Marrano et al., 2019), which improves the accuracy of genotyping. The genetic map spanned 3496.1 cM across the 21 chromosomes, which is similar to the total length of genetic maps for 199 wheat RILs constructed by Liu et al. (2018b) and 186 RILs constructed by Huang et al. (2019). We detected a total of 37 QTLs for HD, PH, TGW, and SL by mapping using the 55K SNP array (Table 4 and Figure 2). Among these QTLs, those on chromosomes 3A (qHD3A), 4B (qPH4B.1 and qTGW4B.1), 5B (qHD5B) , and 6A (qSL6A) that were stably detected in different years, were validated using KASP markers (Figure 3). High LOD values ranging from 5.7 to 23.5 were observed for the QTLs that were validated with KASP markers ( Table 6), indicating that the QTLs detected using the 55K SNP array data are reliable.

Comparison of the Mapped QTLS With Those Identified in Previous Studies
A total of nine QTLs for HD were mapped on chromosomes 1B, 2B, 3A, 4A, 5B, and 6B (Table 4 and Figure 2). Consistent with these findings, in our previous study we also identified QTLs for HD on chromosomes 2B, 3A, and 5B using BSA of the same RIL population . In addition, qHD1B.1 and qHD4A were mapped to genetic regions similar to those reported by Zhao et al. (2019). qHD2B.2 was mapped to a genetic position similar to that of HD QTLs reported by Hu et al. (2020) and Li et al. (2018). The analysis of the physical positions of the flanking markers in the wheat reference genome indicated that qHD2B.2 is probably the Ppd-B1 gene. The two adjacent QTLs qHD3A.1 and qHD3A.2 are located at a genetic position similar to that of an HD QTL reported by Li et al. (2018). The QTL qHD5B, which was stably detected in different years (Table 4 and Figure 2), is located around gene VRN-B1, and our previous results suggested that the VRN-B1 gene is responsible for HD variation in this RIL population . In addition, the stably detected QTL qHD6B was found at a position similar to that of HD QTLs reported by Perez-Lara et al. (2016) and Li et al. (2018).
Regarding PH, we identified two and three QTLs on chromosomes 4B and 6B, respectively (Table 4 and Figure 2). Consistent with these results, previous studies also reported several PH QTLs on chromosomes 4B and 6B Li et al., 2018;Jahani et al., 2019). qPH4B.2, qPH6B.1, qPH6B.2, and qPH6B.3 were mapped to similar genetic positions on chromosomes 4B and 6B as PH QTLs reported by Gao et al. (2015); Li et al. (2018), andHu et al. (2020). BLAST searches of flanking markers for qPH4B.2 against the wheat reference genome indicated that this region harbors the reported RhtB1b gene (Peng et al., 1999). However, using KASP markers for Rht1 (Rasheed et al., 2016) we found that both of the parent lines, eh1 and LX987, had the same RhtB1b genotype ( Supplementary  Figure 1), indicating that the detection of qPH4B.2 is not due to RhtB1b. It is possible that there is other variation in the Rht1 gene that causes differences in PH between the parent lines. qPH2A is located at a similar genetic position as a QTL reported by Li et al. (2018), and the qPH4A region overlaps with the physical region reported by Chen et al. (2020). It has been reported that the semidominant dwarfing gene Rht-NM9 is located in a region from 178.9 Mb to 187.2 Mb on 2AS (Lu et al., 2015). We performed a BLAST search using the flanking markers for qPH2A and found that this QTL is located in a 143-148 Mb interval according to the CS reference genome. Therefore, we speculate that qPH2A does not harbor the Rht-NM9 gene.
The stably detected QTL qTGW4B.1 co-localized with qTGW4B.2 (Table 4 and Figure 2). These QTLs are located within 49.4-52.19 cM. Guan et al. (2018) reported stable QTLs for TGW located within 22.3-95.8 cM on chromosome 4B. qTGW3A is located at a genetic position similar to that reported in Cui et al. (2014a). qTGW3D is located at positions similar to those reported in Cui et al. (2014b) and Gao et al. (2015). The QTLs qTGW3B.1, qTGW3B.2, qTGW3B.3, qTGW5D, and qTGW7A are located at positions similar to those reported by Li et al. (2018), and qTGW6A.1 is located close to a stable yield and TGW QTL reported by Simmonds et al. (2014). In addition, the QTLs qTGW7A and qTGW7D are located at similar genetic positions as those reported by Cui et al. (2014a) and Guan et al. (2018), respectively.
qSL4A is located at a genetic position similar to that reported by Cui et al. (2012b) and Gao et al. (2015). qSL5B.2 is located at a position similar to that in Cui et al. (2012b). In addition, qSL6B is located at a genetic position similar to that reported by Li et al. (2018) and Hu et al. (2020). To the best of our knowledge, the stably detected QTL qSL6A with a LOD value ranging from 11.1 to 18.4 is likely to be a new QTL (Table 4 and Figure 2).

Pleiotropic QTLS for HD, PH, TGW, and SL
Among the QTLs for HD, PH, TGW, and SL detected in this study, four regions controlled two or more of these traits ( Table 5). In addition to Rht1, a previous study identified a "QTL-hotspot" region for yield-related traits on chromosome 4B (Guan et al., 2018). This is consistent with the QTL cluster detected in our study ( Table 5). Consistent with the positive correlation between PH and TGW ( Table 2 and Supplementary  Table 9), the superior alleles of the co-localized QTLs qPH4B.1, qTGW4B.1, and qTGW4B.2 were derived from the same parent line (Table 4). A QTL cluster for HD, PH, and SL that mapped to the interval 101.44-119.21 cM on chromosome 6B (Table 5) is likely the same or similar to a QTL cluster for yield-related traits reported by Li et al. (2018). qHD6B co-localized with qSL6B, with favorable alleles derived from opposite parents (Tables 4, 5). This is consistent with the negative correlation between HD and SL ( Table 2).

Candidate Genes Affecting PH, TGW, and HD
Using KASP markers, we delimited the QTL regions for PH and TGW on chromosome 4B to a 0.8 Mb physical region ( Figure 3B and Table 6). A recent study identified a QTL cluster for TGW linked to Rht-B1 on chromosome 4B using near-isogenic lines (Guan et al., 2020). This region includes the physical interval identified in our study. According to gene annotation and analysis of the effects of SNPs in the mapped region (Supplementary Table 6), a mutation in TraesCS4B02G281200 encoding a PAM68 protein showed the largest effect on gene function. The PAM68 protein is essential for efficient D1 biogenesis and photosystem II assembly in Arabidopsis (Armbruster et al., 2010(Armbruster et al., , 2013. Split-ubiquitin assays suggested that the C terminus of Arabidopsis PAM68 is required for interaction with the PSII core proteins D1 and CP43 (Armbruster et al., 2010). The variation in the PAM68 protein between LX987 and eh1 is located at the C terminus, and this region is conserved in grasses (Supplementary Figure 2). This indicates that the mutation in PAM68 may affect gene function. Photosynthesis plays an important role in yield improvement (Zhu et al., 2010). It has been reported that mutation of the photosystem 1-F subunit (OsPS1-F) results in reduction of PH and grain yield in rice (Ramamoorthy et al., 2018). Taken together, our results and previous findings suggest that PAM68 is a candidate gene for the PH and TGW QTLs. We found genes TraesCS4B01G281000 and TraesCS4B01G281300 in the QTL region were not expressed in the RNA-seq data, which may due to that the RNA-seq data was collected from spikes in the HD . Therefore, we could not exclude these two genes as candidate genes from the sequences of RNA and the predicted effects of SNP. However, TraesCS4B01G281000 and TraesCS4B01G281300 encode Histidine-containing phosphotransfer protein and Tribbles homolog 3, respectively, which have not been reported for involving in PH and TGW. In this respect, the possibility for these two candidate genes is low.
We also confirmed and narrowed down the QTL region for HD on chromosome 3A to a 2.5 Mb interval ( Figure 3A and Table 6). Analysis of SNP effects suggested that a mutation in TraesCS3A01G086400 has a large effect on gene function (Supplementary Table 8), suggesting that this gene may affect HD in the RIL population. TraesCS3A01G086400 encodes an R3H domain-containing protein, which functions in binding polynucleotides, including DNA, RNA, and single-stranded DNA (Grishin, 1998). Studies on R3H-containing proteins in maize (Saleh et al., 2006) and Arabidopsis  have suggested that these proteins are involved in stress responses. Whether the R3H domain-containing protein contributes to HD variation need to be further studied.

CONCLUSION
We identified 37 QTLs for HD, PH, TGW, and SL in a RIL population using the Wheat55K SNP Array, and validated the stably detected QTLs on chromosome 3A, 4B, and 6A using KASP markers. The QTLs on chromosomes 4B and 3A were delimited to a physical interval of 0.8 Mb and 2.5 Mb, respectively. Moreover, the candidate genes affecting PH, TGW, and HD were predicted based on gene annotation and analysis of SNP effects. The linked KASP markers developed in this study will facilitate breeding for yield improvement in wheat.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

AUTHOR CONTRIBUTIONS
LL conceived the project and revised the manuscript. HX and YL conducted most of the experiments. HG constructed the RIL population and assisted in collection of the phenotypic data. YX and JG performed genotype analyses. LZ, SZ, and YD participated in field trials. HX wrote the first draft of the manuscript. All authors have read and approved the final manuscript.