Multi-Locus Genome-Wide Association Study of Four Yield-Related Traits in Chinese Wheat Landraces

Wheat (Triticum aestivum L.) is one of the most important crops in the world. Here, four yield-related traits, namely, spike length, spikelets number, tillers number, and thousand-kernel weight, were evaluated in 272 Chinese wheat landraces in multiple environments. Five multi-locus genome-wide association studies (FASTmrEMMA, ISIS EN-BLASSO, mrMLM, pKWmEB, and pLARmEB) were performed using 172,711 single-nucleotide polymorphisms (SNPs) to identify yield-related quantitative trait loci (QTL). A total of 27 robust QTL were identified by more than three models. Nine of these QTL were consistent with those in previous studies. The remaining 18 QTL may be novel. We identified a major QTL, QTkw.sicau-4B, with up to 18.78% of phenotypic variation explained. The developed kompetitive allele-specific polymerase chain reaction marker for QTkw.sicau-4B was validated in two recombinant inbred line populations with an average phenotypic difference of 16.07%. After combined homologous function annotation and expression analysis, TraesCS4B01G272300 was the most likely candidate gene for QTkw.sicau-4B. Our findings provide new insights into the genetic basis of yield-related traits and offer valuable QTL to breed wheat cultivars via marker-assisted selection.


INTRODUCTION
Wheat (Triticum aestivum L.) is one of the most important crops in the world. A 100% increase in crop production by 2050 will be needed to keep pace with projected human population growth (Ray et al., 2013). Thus, it is imperative to increase crop yield. Wheat yield consists of three main components, including spike number per plant, grain number per spike, and thousand-kernel weight (TKW). Spike number per plant is determined by tillers number (TN) per plant. Spikelets number (SN) per spike and spike length (SL) significantly affect grain number per spike. Understanding the genetic basis of these yield-related traits can contribute to improving wheat yield. Chinese wheat landraces have been widely used for breeding cultivated varieties of wheat (Dai et al., 2009). Wheat landraces show high genetic diversity and extensive phenotypic variation, such as early maturity, high numbers of kernel per spikelet, and good adaption to local environmental conditions (He and Huang, 2001;Hao et al., 2008). Genetic analyses of Chinese wheat landraces have revealed the basis of agronomic traits, such as yield-related traits (spikelets number per spike, tillers number, and thousand-kernel weight), plant morphological traits (flag leaf length, flag leaf width, and plant height; Liu et al., 2017;Ma et al., 2020), stress resistance (pre-harvest sprouting, drought-related traits, and phosphorusdeficiency tolerance; Zhou et al., 2017;Lin et al., 2019Lin et al., , 2020a, and disease resistance (powdery mildew and stripe rust; Xiao et al., 2013;Long et al., 2019). Analysis of gene diversity and polymorphism information content revealed the high diversity of Chinese wheat landraces Long et al., 2019). Thus, genetic analysis of yield traits using Chinese wheat landraces can provide important insights into wheat breeding.
With the development of next-generation sequencing (NGS), genome-wide association study (GWAS) has become an effective way of detecting complex quantitative characteristics and is also widely applied in Arabidopsis (Atwell et al., 2010;Bac-Molenaar et al., 2016), rice Zhu et al., 2016), maize (Lu et al., 2010(Lu et al., , 2012Yang et al., 2014), Aegilops tauschii (Liu et al., 2015a,b;Qin et al., 2015Qin et al., , 2016, and wheat Maccaferri et al., 2015;Sukumaran et al., 2015;Lin et al., 2017). Moreover, the previous studies have discovered genes via GWAS directly. In rice, Yano identified a gene comprehensively controlling rice architecture using GWAS (Yano et al., 2019). Kim reported a novel resistance gene, Xa43(t), for bacterial blight (Kim and Reinke, 2019). In particular, GWAS has gradually been applied to wheat landraces. Using GWAS, a total of 149 significant markers for 21 agronomic traits were detected in 723 wheat landraces . A total of 51 loci significantly associated with adult-plant resistance to stripe rust were discovered in wheat landrace through GWAS (Long et al., 2019). Recently, a major locus of coleoptile length on chromosome 6B was revealed by GWAS in 707 Chinese wheat landraces (Ma et al., 2020).
Because of the mass of data involved in the process of GWAS, several multi-locus models have been designed to increase efficiency . Compared with the single-locus model, the multi-locus models can help improve the detection power of GWAS (Xu et al., 2018). Therefore, multi-locus models have recently been popularized in plant GWAS, such as in the photosynthetic response to low phosphorus stress in soybean (Lü et al., 2018), fatty acid content in rapeseed (Guan et al., 2019), forage quality-related traits in sorghum , salt tolerance of direct seeding in rice (Cui et al., 2018), callus regenerative traits, starch pasting properties, and stalk lodging resistance-related traits in maize (Ma et al., 2018;Xu et al., 2018;Zhang et al., 2018), agronomic traits in barley (Hu et al., 2018), and free amino acid levels in wheat (Peng et al., 2018). All these studies have successfully discovered novel quantitative trait loci (QTL).
In the present study, a total of 272 Chinese wheat landraces were evaluated in multiple environments to improve our understanding of the genetic basis of four yield-related traits. Five multi-locus GWAS models were performed to identify robust QTL using 172,711 SNPs. Major QTL were validated in two recombinant inbred line (RIL) populations. Furthermore, we presumed candidate genes for the major QTL. This study provides new QTL of yield-related traits that may help wheat breading in the near future.

Plant Material, Phenotype Evaluation, and Data Analysis
A total of 272 wheat landraces, obtained from 10 major wheatgrowing zones in China, were utilized in this study (Supplementary Table S1). All landraces were planted in six environments: Ya'an (103°370 E, 22°420 N) in 2012 (E1), Wenjiang (103°410 E, 30°360 N) in 2013, 2014, and 2015 (E2, E3, and E4), and Chongzhou (103°390 E, 30°330 N) in 2014 and 2015 (E5 and E6). Each row of material was 30 cm apart and 1.5 m long, and contained 15 seeds. Field management referred to criteria commonly practiced in wheat production. Four yield-related traits were evaluated in at least four environments: SL -the average length of the main spikes from five plants; SN -the average total number of spikelets from five main spikes of the plant; TN -the average number of tillers in five plants; TKW -the average weight of five samples of 1,000 kernels randomly selected from a given genotype. Two bi-parental populations [Huimai × Datianquxiaomai (HD) and Huimai × Heshangmai (HH)] were used to validate the results. The parents of bi-parental populations were selected from 272 wheat landraces. The TKW of these two RIL populations was evaluated in Chongzhou in 2019, and Huimai resulted significantly more productive than Datianquxiaomai and Heshangmai (higher TKW).
To eliminate environmental effects, the best linear unbiased prediction (BLUP) value for each trait was calculated across environments and used for statistical analysis. The BLUP was calculated using the methodology of Piepho et al. (2008) as previously described . The broad-sense heritability (H 2 ) value was calculated using SAS v8.1 (SAS Institute Inc., Cary, NC, United States) and is defined as: where Vg, Vge, and Ve are the estimates of genetic variance, the genotype × environment interaction, and environmental variance, respectively (Smith et al., 1998). Correlation analyses were performed using SPSS 20 (IBM SPSS Statistics; IBM Corp., Armonk, NY, United States).

Genotyping
Genomic DNA was extracted from a single plant for each of the accessions using the cetyltrimethylammonium bromide method (Murray and Thompson, 1980). DNA samples with

GWAS for Yield Traits
The population structure was performed by Structure2.3.4 based on the Bayesian clustering algorithm (Pritchard et al., 2000). Ten runs of STRUCTURE were performed with a K between 1 and 10 using the admixture model with 100,000 replicates for burn-in and 100,000 replicates for MCMC. The R package "mrMLM" in R Project for Statistical Computing was used to examine the association between markers and yield-related traits. Marker-trait associations were performed using five multilocus models, including FASTmrEMMA , ISIS EM-BLASSO , mrMLM , pKWmEB (Ren et al., 2018), and pLARmEB . All five models were adjusted by both population structure and family relationship. A logarithm of odds (LOD) value ≥3 was used as the screening criterion (Guan et al., 2019). According to previous studies, the linkage disequilibrium decay of wheat ranges from 3.5 to 23 Mb (Kidane et al., 2019;Li et al., 2019;Luján Basile et al., 2019). Significant SNPs were therefore selected with a physical distance ≤10 Mb and referred to as a conservative QTL.

Validation of QTL Using Two RIL Populations
To further validate the significant QTL identified for TKW, the peak SNP with this locus was converted into a kompetitive allele-specific polymerase chain reaction (KASP) marker based on the probe sequence. KASP primers were designed and analyzed as described in previous studies (Lin et al., 2020b(Lin et al., , 2021 and produced by Sangon Biotech (Shanghai, China; Supplementary Table S2). The KASP marker detected different alleles at this locus in the two bi-parental populations. Eighty-two lines were selected for each of the alleles from both populations. These lines were used to evaluate differences in TKW between the two allele groups using a Student's t-test in SPSS 20 (IBM SPSS Statistics; IBM Corp., Armonk, NY, United States).

Candidate Gene Prediction
Based on IWGSC RefSeq v1.0, predicted genes within ~10 Mb of the physical location of the QTL were selected. We undertook two different methods to predict the possible existence of candidate genes. The first method was expression analysis. Based on data from WheatExp, 2 genes expressed highly at stages Z71 and Z75 were the most important, due to the key stages in kernel development (Zadoks et al., 1974). Fragments per kilobase of exon model (FPKM) represented gene expression level. As in Lin et al. (2020b), genes expressing less than two per million mapped reads at any stage were removed. FPKM is fragments read per thousand bases per million mappings and represented gene expression level.
The second method to predict the possible existence of candidate genes was annotation. All genes were also used to perform homologous annotations in rice and Arabidopsis using the KEGG Orthology Based Annotation System 3.0 (KOBAS 3.0; Xie et al., 2011). 3 Functional annotations were performed via UniProt 4 and EnsemblPlants. 5

Phenotypic Variation in Chinese Wheat Landraces
The four traits among the 272 wheat landraces varied considerably ( Table 1). Based on the BLUP values, the SL ranged from 6.33 to 14.63 (cm). SN ranged from 19.43 to 27.36 (count). TN ranged from 8.11 to 18.10 (count), and TKW ranged from 17.90 to 40.47 (g). The heritability of these four traits ranged from 0.64 to 0.93 (Table 1). TN and TKW showed medium heritability, whereas SL and SN showed high heritability. Correlation analysis showed that all correlations were significant (p < 0.05), except the correlation between SL and SN ( Table 2), indicating that the changes in these two traits are independent.

Five Multi-Locus Models of Yield-Related Traits
A total of 172,711 polymorphic SNPs were obtained [after filtering for missing values ≤10% and minor allele frequency (MAF) ≥0.05] from the Wheat 660 K SNP arrays. This subset was used to perform GWAS. Based on LOD values ≥3, a total of 308 significant SNP markers were identified using the five multi-locus models. Detailed significant SNP markers information and the number of significant SNP markers detected by different models are shown in Supplementary Table S3 and Figure 1, respectively. In FASTmrEMMA, the least number of significant SNP markers was detected, with only 35. A total of 10, 8, 6, and 11 significant SNP markers were detected for SL, SN, TN, and TKW, respectively, with phenotypic variation explained (PVE) up to 6.55%. In the model of ISIS EM-BLASSO, 73 significant SNP markers were revealed. A total of 20, 23, 15, and 15 were detected for SL, SN, TN, and TKW, respectively, with PVE up to 9.27%. The mrMLM model could detect 67 significant markers. A total of 18, 20, 14, and 15 were detected for SL, SN, TN, and TKW, respectively, with PVE up to 13.09%. In the model of pKWmEB, the most number of significant SNP markers was detected (81), and a total of 23, 25, 19, and 14 were detected for SL, SN, TN, and TKW, respectively, with PVE up to 18.78%. The pLARmEB model detected 52 significant SNP markers, and 12, 15, 11, and 14 were detected for SL, SN, TN, and TKW, respectively, with PVE up to 11.09%. Significant markers existed in the above three models, and no more than 10 Mb was considered as a robust QTL.

Robust QTL Selected by Five Multi-Locus Models
Twenty-seven robust QTL were identified by more than three different multi-locus models and were considered as robust QTL (

Validation of Genetic Effect and Candidate Genes of QTkw.sicau-4B
To validate the genetic effect of the peak SNP for QTkw. sicau-4B, a KASP marker (KASP-AX-108886949) was developed and the differences between TKW of landraces carrying alternatively A/G allele were calculated, both in natural population and in two RIL populations. Among the 272 landraces tested, the average TKW of landraces carrying genotype allele A was heavier than those carrying allele G (p < 0.01; Table 4; Figure 2A). The difference was 22.87%. In the HD population, genotypes were divided into two groups: 42 with allele A and 40 with allele G. The TKW of genotypes with allele A ranged from 28.75 to 44.04 g, whereas those with allele G ranged from 22.28 to 39.27 g. The average TKW with allele A was heavier than that with allele G (p < 0.01; Table 4; Figure 2B). In the RIL population HH, there were 39 with allele A and 43 with allele G. The TKW of genotypes with allele A ranged from 22.52 to 44.56 (g), whereas those with allele G ranged from 19.03 to 37.47 g. The average TKW with allele A was heavier than with allele G (p < 0.01; Table 4; Figure 2C). The difference in TKW between genotypes ranged from 15.33 to 16.81%, with an average value of 16.07% among the two RIL populations. Thus, the developed KASP marker may be useful for breeding cultivars carrying a high-TKW allele.
We observed superior allele A frequency in 10 major wheatgrowing zones (Figure 3). The frequencies show that the superior allele A showed an 8% frequency in Chinese wheat landraces, and more than half the materials from the Northern Spring Wheat Zone and the Northwestern Spring Wheat Zone carried allele A. The Northern Winter Wheat Zone, Yellow and Huai River Valleys Facultative Wheat Zone, Middle and Low Yangtze Valleys Autumn-Sown Spring Wheat Zone, Southwestern Autumn-Sown Spring Wheat Zone, and Qinghai-Tibetan Plateau Spring-Winter also had some materials with allele A. Materials from the Southern Autumn-Sown Spring Wheat Zone, Northeastern Spring Wheat Zone, and Xinjiang Winter-Spring Wheat Zone did not carry allele A.
A total of 46 high-confidence genes which were identified in IWGSC RefSeq v1.0 flanked QTkw.sicau-4B by no more 5 Mb. The search for candidate genes was undertaken following two pipelines. The expression analysis of the candidates was investigated in the two most interesting stages for kernel development: Z71 and Z75. Nine genes were expressed at both stages, namely, TraesCS4B01G272300, TraesCS4B01G272500, TraesCS4B01G273400, TraesCS4B01G273500, TraesCS4B01G274200, TraesCS4B01G274500, TraesCS4B01G275400, TraesCS4B01G275500, and TraesCS4B01 G276300. Based on these sequences, we found homologous genes using KOBAS 3.0 (see footnote 3). Three genes were identified as TKW-regulating candidate genes (Table 5). TraesCS4B01G272300,

DISCUSSION
Yield has always been a focus of global wheat research. With the development of NGS technology, GWAS has been applied extensively to detect complex traits Maccaferri et al., 2015;Sukumaran et al., 2015). Previous studies found that the accuracy and efficiency of models, including generalized linear, mixed linear, and multi-locus models, have constantly improved (Zhang et al., 2005;Price et al., 2006;Li et al., 2018).
In the present study, we observed not consistent results between different multi-locus models. FASTmrEMMA was more conservative, and pKWmEB was the most effective model of the five models (Supplementary Table S3; Figure 1); this finding is in agreement with the previous studies Lü et al., 2018). QTL detected by multiple, multi-locus models are considered as robust QTL useful for more precise studies. Among the 308 significant SNP markers, 27 reliable and robust QTL were identified in four yield-related traits in this study. Using BLAST against the IWGSC RefSeq v1.0, we tried to determine the physical location of QTL flanking markers. Among these 27 robust QTL, nine loci  were overlapped with, or located close to, previously reported QTL.
For TN, Bilgrami et al. (2020) reported a QTL MQTL6D-4 in physical position 456.46-469.25 Mb with a mean R 2 of 14.08%, which is close to QTn.sicau-  in the present study.
For TKW, using a doubled haploid population Zhang et al. (2014) identified a QTL for TKW on chromosome 3A (Qtkw3A-1) at 625.79-690.79 Mb, which is close to QTkw.sicau-3A, and mapped at 686.12 Mb in this study. In general, nine QTL were identified in the same position as those in previous studies; the remaining 18 are potentially novel QTL.
Among robust and novel QTL, we focused on QTkw. sicau-4B that showed the highest PVE value. To validate this   Table S2). According to high PVE value and low frequency, we believe this QTL could help wheat breeding. Three genes identified by functional annotation showed relationship with TKW. TraesCS4B01G275400 is orthologous to H2B (Histone H2B monoubiquitination), which regulates abscisic acid signaling, and is a target of UBP26, which acts as a transcriptional repressor involved in kernel development (Ma et al., 2019a). TraesCS4B01G276200 is orthologous to OsCDR1. Weights between OsCDR1 transgenic lines and wild-type plants differed (Prasad et al., 2009). TraesCS4B01G272300 is orthologous to rice OsMCA1, which mutants showed obviously affected TKW (Liu et al., 2015c). TraesCS4B01G272300 gene, highly expressed in the grain development stage (Supplementary Figure S1), appears to be the most promising candidate gene for QTkw. sicau-4B. In our further study, expressions of this candidate  gene will be validated by qRT-PCR in different grain development stage. Transgenic tests will also be applied to validate its function.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.