AUTHOR=Qin Jun , Wang Fengmin , Zhao Qingsong , Shi Ainong , Zhao Tiantian , Song Qijian , Ravelombola Waltram , An Hongzhou , Yan Long , Yang Chunyan , Zhang Mengchen TITLE=Identification of Candidate Genes and Genomic Selection for Seed Protein in Soybean Breeding Pipeline JOURNAL=Frontiers in Plant Science VOLUME=Volume 13 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2022.882732 DOI=10.3389/fpls.2022.882732 ISSN=1664-462X ABSTRACT=Soybean is a primary meal protein for human consumption, poultry and livestock feed. Exploring molecular approaches to increase seed protein genetic gain has been one of the main challenges for soybean breeders and geneticists. In this study, genome-wide association studies (GWAS) and linkage mappiing of quantitative trait loci (QTL) for protein were conducted based on 284 soybean accessions and 180 recombinant inbred lines (RIL), respectively. The 464 individuals were evaluated for protein content in four years and genotyped by sequencing. Totally, 22 SNPs significantly associated with protein content were identified using MLM & GLM methods in Tassel, and 5 QTLs related to protein content were detected using Bayesian IM, SMIM, SMLE and SMR models in Q-gene and Icimapping. Major QTL were repeatedly detected and mapped on chromosomes (Chr.) 6 and 20 in both populations. The new genomic region on Chr06_18844283-19315351 included 7 candidate genes. Haplotype analysis showed that Hap. XAA located at Chr6_19172961 was a haplotype associated with high protein. Genomic selection (GS) was performed for protein content using Bayesian LASSO (BL) and rrBULP, based on the whole sets of SNPs, the GWAS-derived SNP, different SNP sets, and different training population sets. The results showed that Bayesian LASSO performed similarly to rrBLUP model. The GS accuracy was dependent on SNP set and training population size; selection efficiency of protein based on GWAS-derived SNP was higher than that based on random markers, 2000 or more SNPs had similar prediction accuracy, and the best fold cross-validation was 7. The SNP markers identified in this study were essential in establishing an efficient marker-assisted selection (MAS) and GS pipeline for improving soybean protein content.