# USE OF BARLEY AND WHEAT REFERENCE SEQUENCES: DOWNSTREAM APPLICATIONS IN BREEDING, GENE ISOLATION, GWAS AND EVOLUTION

EDITED BY : Dragan Perovic, Hikmet Budak, Kazuhiro Sato and Pierre Sourdille PUBLISHED IN : Frontiers in Plant Science

#### Frontiers eBook Copyright Statement

The copyright in the text of individual articles in this eBook is the property of their respective authors or their respective institutions or funders. The copyright in graphics and images within each article may be subject to copyright of other parties. In both cases this is subject to a license granted to Frontiers. The compilation of articles constituting this eBook is the property of Frontiers.

Each article within this eBook, and the eBook itself, are published under the most recent version of the Creative Commons CC-BY licence. The version current at the date of publication of this eBook is CC-BY 4.0. If the CC-BY licence is updated, the licence granted by Frontiers is automatically updated to the new version.

When exercising any right under the CC-BY licence, Frontiers must be attributed as the original publisher of the article or eBook, as applicable.

Authors have the responsibility of ensuring that any graphics or other materials which are the property of others may be included in the CC-BY licence, but this should be checked before relying on the CC-BY licence to reproduce those materials. Any copyright notices relating to those materials must be complied with.

Copyright and source acknowledgement notices may not be removed and must be displayed in any copy, derivative work or partial copy which includes the elements in question.

All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. For further information please read Frontiers' Conditions for Website Use and Copyright Statement, and the applicable CC-BY licence.

ISSN 1664-8714 ISBN 978-2-88963-963-2 DOI 10.3389/978-2-88963-963-2

### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

## USE OF BARLEY AND WHEAT REFERENCE SEQUENCES: DOWNSTREAM APPLICATIONS IN BREEDING, GENE ISOLATION, GWAS AND EVOLUTION

Topic Editors: Dragan Perovic, Julius Kühn-Institut, Germany Hikmet Budak, Montana Bioagriculture, Inc. United States Kazuhiro Sato, Okayama University, Japan Pierre Sourdille, INRAE Clermont-Auvergne-Rhône-Alpes, France

Citation: Perovic, D., Budak, H., Sato, K., Sourdille, P., eds. (2020). Use of Barley and Wheat Reference Sequences: Downstream Applications in Breeding, Gene Isolation, GWAS and Evolution. Lausanne: Frontiers Media SA. doi: 10.3389/978-2-88963-963-2

# Table of Contents


Dalia Z. Alomari, Kai Eggert, Nicolaus von Wirén, Ahmad M. Alqudah, Andreas Polley, Jörg Plieske, Martin W. Ganal, Klaus Pillen and Marion S. Röder


Azahara Carmen Martín, Philippa Borrill, Janet Higgins, Abdulkader Alabdullah, Ricardo H. Ramírez-González, David Swarbreck, Cristobal Uauy, Peter Shaw and Graham Moore

*52 High Resolution Genetic and Physical Mapping of a Major Powdery Mildew Resistance Locus in Barley*

Parastoo Hoseinzadeh, Ruonan Zhou, Martin Mascher, Axel Himmelbach, Rients E. Niks, Patrick Schweizer and Nils Stein

*66 Dissection of Pleiotropic QTL Regions Controlling Wheat Spike Characteristics Under Different Nitrogen Treatments Using Traditional and Conditional QTL Mapping*

Xiaoli Fan, Fa Cui, Jun Ji, Wei Zhang, Xueqiang Zhao, JiaJia Liu, Deyuan Meng, Yiping Tong, Tao Wang and Junming Li


Arantxa Monteagudo, Ana M. Casas, Carlos P. Cantalapiedra, Bruno Contreras-Moreira, María Pilar Gracia and Ernesto Igartua

*116 Uncovering Genomic Regions Associated With 36 Agro-Morphological Traits in Indian Spring Wheat Using GWAS*

Sonia Sheoran, Sarika Jaiswal, Deepender Kumar, Nishu Raghav, Ruchika Sharma, Sushma Pawar, Surinder Paul, M. A. Iquebal, Akanksha Jaiswar, Pradeep Sharma, Rajender Singh, C. P. Singh, Arun Gupta, Neeraj Kumar, U. B. Angadi, Anil Rai, G. P. Singh, Dinesh Kumar and Ratan Tiwari

### *136 High-Density Mapping of Triple Rust Resistance in Barley Using DArT-Seq Markers*

Peter M. Dracatos, Rouja Haghdoust, Ravi P. Singh, Julio Huerta Espino, Charles W. Barnes, Kerrie Forrest, Matthew Hayden, Rients E. Niks, Robert F. Park and Davinder Singh

*147 Development of Genome-Wide SNP Markers for Barley via Reference-Based RNA-Seq Analysis*

Tsuyoshi Tanaka, Goro Ishikawa, Eri Ogiso-Tanaka, Takashi Yanagisawa and Kazuhiro Sato

*156 High Resolution Mapping of* RphMBR*1012 Conferring Resistance to* Puccinia hordei *in Barley (*Hordeum vulgare *L.)* 

Leila Fazlikhani, Jens Keilwagen, Doris Kopahnke, Holger Deising, Frank Ordon and Dragan Perovic

*174 Molecular Characterization of 87 Functional Genes in Wheat Diversity Panel and Their Association With Phenotypes Under Well-Watered and Water-Limited Conditions*

Maria Khalid, Fakiha Afzal, Alvina Gul, Rabia Amir, Abid Subhani, Zubair Ahmed, Zahid Mahmood, Xianchun Xia, Awais Rasheed and Zhonghu He


Shubin Wang, Steven Xu, Shiaoman Chao, Qun Sun, Shuwei Liu and Guangmin Xia

*211 Detecting Large Chromosomal Modifications Using Short Read Data From Genotyping-by-Sequencing* 

Jens Keilwagen, Heike Lehnert, Thomas Berner, Sebastian Beier, Uwe Scholz, Axel Himmelbach, Nils Stein, Ekaterina D. Badaeva, Daniel Lang, Benjamin Kilian, Bernd Hackauf and Dragan Perovic


# Editorial: Use of Barley and Wheat Reference Sequences: Downstream Applications in Breeding, Gene Isolation, GWAS, and Evolution

Dragan Perovic1\*, Hikmet Budak <sup>2</sup> , Kazuhiro Sato<sup>3</sup> and Pierre Sourdille<sup>4</sup>

<sup>1</sup> Federal Research Centre for Cultivated Plants, Institute for Resistance Research and Stress Tolerance, Julius Kuehn Institute, Quedlinburg, Germany, <sup>2</sup> Montana BioAgriculture, Inc., Bozeman, MT, United States, <sup>3</sup> Institute of Plant Science and Resources, Okayama University, Kurashiki, Japan, <sup>4</sup> INRAE, UMR 1095 INRAE—UCA Genetics, Diversity & Ecophysiology of Cereals, Clermont-Ferrand, France

Keywords: barley, breeding, gene isolation, genome reference sequence, wheat

Editorial on the Research Topic

### Use of Barley and Wheat Reference Sequences: Downstream Applications in Breeding, Gene Isolation, GWAS, and Evolution

Edited and reviewed by:

Rodomiro Ortiz, Swedish University of Agricultural Sciences, Sweden

\*Correspondence: Dragan Perovic dragan.perovic@julius-kuehn.de

#### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 05 June 2020 Accepted: 22 June 2020 Published: 07 July 2020

#### Citation:

Perovic D, Budak H, Sato K and Sourdille P (2020) Editorial: Use of Barley and Wheat Reference Sequences: Downstream Applications in Breeding, Gene Isolation, GWAS, and Evolution. Front. Plant Sci. 11:1017. doi: 10.3389/fpls.2020.01017 Barley and wheat are the most important temperate cereal crops of the Triticeae. Although they ranked in terms of world production, fourth [barley 141 million tons (MT)] and second (wheat 749 MT), according to FAO (2018), their large genomes prevented them from being fully sequenced until recently. Nevertheless, advances in the development of new high-throughput sequencing technologies together with efforts of scientific communities enabled accomplishment of this longstanding aims (Mascher et al., 2017; IWGSC, 2018). Nowadays, the availability of these highstandard reference genomes has ushered in a new era of barley and wheat genomics, and their decoded blueprints open new avenues in exploring genome sequences for both applied and basic research. Barley and wheat geneticists and breeders are presently in a position in which users of the first sequenced model plants, rice, and Arabidopsis, were nearly twenty years ago. The current Frontiers in Plant Science research collection of 18 articles sheds light on how knowledge of whole barley and wheat genome sequences promotes applied breeding [Genome Wide Association Study (GWAS), genomic selection (GS)], basic biological research [mapping of major genes and quantitative trait loci (QTLs)], accelerated isolation of novel genes, novel methods in sequence analysis, and rapid detection of natural variation. Furthermore, the presented articles revealed efficient use of genetics and breeding methods in harnessing genetic resources of barley and wheat that promote the rapid improvement of cultivars.

Ladejobi et al. demonstrated that using this resource for alignment of Genotyping-by-Sequencing (GBS) reads and variant Single Nucleotide Polymorphism (SNP) calling enabled the generation of thousands of high-quality SNP data points. When applied to association mapping and genomic prediction, GBS data anchored to wheat IWGSC RefSeq v1.0 generally improved prediction accuracy. In particular, this study demonstrated the utility of GBS reads for efficiently predicting traits with numerous loci each having a small effect, proving its suitability for GS.

Furthermore, Alomari et al. showed the power of the GWAS approach for identifying putative candidate genes for Zn accumulation into the grain using 369 European wheat varieties genotyped with high-density arrays of SNP markers (90k iSelect Infinium and 35k Affymetrix arrays) in combination with the wheat IWGSC Ref Seq v1.0. This study discovered two physically anchored chromosomal segments located on chromosomes 3B and 5A, as genetic factors controlling Zn accumulation into the grain. These genomic regions include newly identified putative candidate genes, which are related to Zn uptake and transport or represent bZIP and mitogen-activated protein kinase genes.

Wang et al. applied GWAS in 493 durum worldwide collection to address the genetic basis of 17 agronomically important traits and a drought wilting score. Based on sequence alignment of the markers to the reference genome of bread wheat, they identified 14 putative candidate genes involved in enzymes, hormone-response, and transcription factors. The GWAS in durum wheat and a previous QTL analysis in bread wheat identified a consensus QTL locus.4B.1 conferring drought tolerance, which was further scanned for the presence of potential candidate genes.

Sheoran et al. used GWAS in a diverse panel of 404 spring wheat genotypes in India. By using Breeders' 35K SNP Axiom array covering 4364.79 cM of the wheat genome, a total of 146 SNPs were found associated with 23 out of 36 studied traits explaining 3.7–47.0% of phenotypic variance. Gene annotation mined ∼38 putative candidate genes, which were confirmed using tissue and stage specific gene expression derived from RNA Seq data. They observed strong colocalized loci for four traits on chromosome 1B and annotated five putative candidate genes.

Fan et al. identified that QTLs for six spike-related traits under two different nitrogen (N) supplies, based on a highdensity genetic linkage map constructed using PCR markers, DArTs, and Affymetrix Wheat 660 K SNP array. A total of 157 traditional QTLs and 54 conditional loci were detected by inclusive composite interval mapping, among which three completely low N-stress induced QTLs were found to maintain the desired spikelet fertility and kernel numbers even under N deficiency through pyramiding elite alleles.

Liu et al. conducted selection signal detection and GWAS for spike related traits in common wheat. Based on the genotyping results (90K SNP array), 192 common wheat samples from southwest China were analyzed. A total of 146 selective windows and 184 significant SNPs were detected. According to the wheat RefSeq v1.0, these SNP clusters and their overlapping/ flanking QTLs that were previously reported were integrated to a physical map. According to the haplotype analysis, KASP markers were developed.

Martin et al. presented an extensive analysis of RNA-seq data in the presence and absence of the Ph1 locus in order to find out how this gene likely modified the meiotic process and plays a role in polyploidy adaptation. Plant material from an early prophase from six different genotypes (wheat, wheat–rye haploid hybrids and newly synthesized octoploid triticale) unexpectedly revealed that neither synapsis, whole genome duplication nor the absence of the Ph1 locus was associated with major changes in gene expression levels during early meiotic prophase. Overall, results of this study suggested that wheat transcription at this meiotic stage is highly resilient to such alterations, even in the presence of major chromatin structural changes.

Six articles describe characterization and use of germplasm, wild relatives' introgressions, a-gene stocks.

Monteagudo et al. showed that Spanish barley landraces contribute to the improvement of elite cultivars as donors of novel alleles/genes of agronomically important traits such as flowering time, yield, and drought-related traits. In specific cases, they could become cultivars directly or at least could be used as parents in plant breeding programs due to their reduced genetic load.

Dinglasan et al. characterized and mapped resistance to net form of net blotch [Pyrenophora teres f. teres (Ptt)] in the international barley differential cv. Canadian Lake Shore (CLS) using a doubled haploid (DH) population. The authors identified a major QTL (qPttCLS) on chromosome 3H conferring resistance to Ptt, while aligning DArTseq markers to the barley physical-map position allowed identification of annotated genes

Dracatos et al. used GBS for the resistance to rust diseases in barley. They produced a high-density linkage map comprising 8,610 (SNPs and in silico) markers spanning 5957.6 cM to map resistance to leaf rust, stem rust, and stripe rust.

In wheat, Mia et al. reported fast track development and evaluation of Near Isogenic Lines (NILs) from C306 × Dharwar Dry targeting a wheat 4BS QTL hotspot in C306, which confers drought tolerance following the heterogeneous inbreed family (HIF) analysis coupled with immature embryo culture-based fast generation technique. Quantitative RT-PCR analysis targeting the MYB 82 transcription factor (TaMYB82), within this genomic region, also revealed differential expression in +NILs and −NILs under stress.

Synthetic wheats were also analyzed. Naz et al. evaluated two advanced backcross populations B22 and Z86, which were derived by crossing winter wheat cultivars Batis and Zentos with synthetic hexaploid wheat accessions Syn022L and Syn086L, respectively. QTL analysis identified seven and 13 favorable exotic QTL alleles associated with enhancement or at least stable grain yield in populations B22 and Z86, respectively. These favorable introgressions were located on all chromosomes from 1D to 7D.

Khalid et al. analyzed a diversity panel consisting in advanced lines derived from synthetic hexaploid wheats for allelic variation at 87 functional genes or loci of breeding importance using 124 high-throughput KASP markers. The major developmental genes such as Vrn-A1, Rht-D1, and Ppd-B1 had confounding effect on several agronomic traits including plant height, grain size and weight, and grain yield in both well-watered (WW) and water-limited (WL) conditions.

Positional cloning of genes is among those activities that benefit most from an anchored and annotated genome sequence. Articles of Fazlikhani et al. and Hoseinzadeh et al. showed that the resistance gene isolation in barley might be faster from gene mapping to the identification and functional validation of candidate gene. In particular, the barley reference sequence delivered detailed information about the physical size of the target intervals, while the respective gene annotation (Mascher et al., 2017) revealed the genes located in the target intervals and putative candidate genes, as well as facilitated the efficient development of molecular markers for marker-assisted selection. Allele specific resequencing of putative candidate genes and construction of corresponding haplotypes are nowadays much accelerated and easier.

Three articles presented methods for the use of reference genome sequences for characterization of genetic resources, development of molecular markers, and identification of noncoding RNA structures.

Keilwagen et al. demonstrated that deep-coverage analysis of GBS data combined with mapping of reads on reference sequences results in the detection of Hordeum vulgare/ Hordeum bulbosum introgression lines as well as to identify large chromosomal rearrangements in barley and wheat collections. In addition, the method is useful to identify genomic regions under selection and could be applied to control for duplicates in gene bank collections.

The availability of reference genome sequence facilitates the generation of markers by elucidating the genomic positions of new markers as well as of their neighboring sequences. Tanaka et al. showed that RNA-Seq-based de novo polymorphism detection system generates genome-wide markers, even in the closely related barley genotypes used in breeding programs.

In wheat and barley, the knowledge about lncRNAs remains very limited. Budak et al. showed that the high-quality reference genomes of wheat and barley significantly help to reduce false annotation of lncRNAs, and to obtain a well-assembled transcriptome data will greatly advance the lncRNA identification procedures. Therefore, high-quality genome sequences are promising resources for the identification of lncRNAs or any class of molecules. As our understanding of lncRNAs expands, interactions among ncRNA classes, as well as interactions with the coding sequences, will likely define novel functional networks that may be modulated for crop improvement.

In short, this topic integrated the different genomic approaches in combination with the biological and agronomic information of importance in two of the most important cereal

### REFERENCES


crops, barley and wheat, in order to provide with new tools and methodologies that allow a great leap forward in plant breeding.

### FUTURE PERSPECTIVES

The availability of genome sequences revolutionized barley and wheat genetics, accelerated identification and use of rare allelic variants in classical breeding schemes, such as marker-assisted backcrossing (MABC), marker-assisted selection (MAS) and pyramiding of genetic factors responsible for important traits. Furthermore, the availability of newly developed genomic tools and resources is leading to a new revolution of plant breeding, as they facilitate the study of the genotype and its relationship with the phenotype. After sequencing of more accessions, as a result of PanGenome projects, it will be possible to do direct targeting of important gene variants and introduce them into cultivars in order to exploit the rich germplasms for breeding purposes. In this regard, new breeding technologies such as site-directed mutagenesis by RNA-guided endonucleases like Cas9 bear possibilities. For example, alleles, identified by new genomic approaches, can be mimicked in breeding lines to circumvent time-consuming crosses.

### AUTHOR CONTRIBUTIONS

DP prepared the first draft of this editorial. All authors contributed to the article and approved the submitted version.

### ACKNOWLEDGMENTS

The research of DP was supported by the German Federal Ministry of Education and Research under the grant number 031B0199B and by the German Federal Ministry of Nutrition and Agriculture under the grant 2818410B18, and the research of HB was supported by the Agrogen, LLC, USA.

Conflict of Interest: Author HB was employed by company Montana BioAg. Inc.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Perovic, Budak, Sato and Sourdille. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Identifying Candidate Genes for Enhancing Grain Zn Concentration in Wheat

Dalia Z. Alomari<sup>1</sup> \*, Kai Eggert<sup>1</sup>† , Nicolaus von Wirén<sup>1</sup> , Ahmad M. Alqudah<sup>1</sup> , Andreas Polley<sup>2</sup> , Jörg Plieske<sup>2</sup> , Martin W. Ganal<sup>2</sup> , Klaus Pillen<sup>3</sup> and Marion S. Röder<sup>1</sup>

<sup>1</sup> Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany, <sup>2</sup> SGS TraitGenetics GmbH, Gatersleben, Germany, <sup>3</sup> Institute of Agricultural and Nutritional Sciences, Martin-Luther-University Halle-Wittenberg, Halle, Germany

Wheat (Triticum aestivum L.) is one of the major staple food crops worldwide. Despite efforts in improving wheat quality, micronutrient levels are still below the optimal range for human nutrition. In particular, zinc (Zn) deficiency is a widespread problem in human nutrition in countries relying mainly on a cereal diet; hence improving Zn accumulation in grains is an imperative need. This study was designed to understand the genetic architecture of Zn grain concentrations in wheat grains. We performed a genome-wide association study (GWAS) for grain Zn concentrations in 369 European wheat genotypes, using field data from 3 years. The complete wheat panel was genotyped by high-density arrays of single nucleotide polymorphic (SNP) markers (90k iSELECT Infinium and 35k Affymetrix arrays) resulting in 15,523 polymorphic markers. Additionally, a subpanel of 183 genotypes was analyzed with a novel 135k Affymetrix marker array resulting in 28,710 polymorphic SNPs for high-resolution mapping of the potential genomic regions. The mean grain Zn concentration of the genotypes ranged from 25.05–52.67 µg g−<sup>1</sup> dry weight across years with a moderate heritability value. Notably, 40 marker-trait associations (MTAs) were detected in the complete panel of varieties on chromosomes 2A, 3A, 3B, 4A, 4D, 5A, 5B, 5D, 6D, 7A, 7B, and 7D. The number of MTAs in the subpanel was increased to 161 MTAs whereas the most significant and consistent associations were located on chromosomes 3B (723,504,241–723,611,488 bp) and 5A (462,763,758–466,582,184 bp) having major effects. These genomic regions include newly identified putative candidate genes, which are related to Zn uptake and transport or represent bZIP and mitogen-activated protein kinase genes. These findings provide the basis for understanding the genetic background of Zn concentration in wheat grains that in turn may help breeders to select high Zn-containing genotypes to improve human health and grain quality.

Keywords: Zinc, Triticum aestivum, wheat quality, micronutrient, GWAS

## INTRODUCTION

Wheat is among the primary staple crops in the world and its production reached almost 750 million tons per year (FAOSTAT, 2016<sup>1</sup> ), while 68% of the yield is used for human nutrition (FAOSTAT, 2012). Wheat provides substantial amounts of mineral elements, which are beneficial for human health. Several reports emphasize that over 2 billion of people are suffering from hidden

<sup>1</sup>http://faostat.fao.org

#### Edited by:

Pierre Sourdille, INRA Centre Auvergne Rhône-Alpes, France

#### Reviewed by:

Zhaohui Wang, Northwest A&F University, China Benoit Darrier, The University of Adelaide, Australia Sabina Vitalievna Chebotar, Odessa University, Ukraine

\*Correspondence:

Dalia Z. Alomari alomari@ipk-gatersleben.de; alamridalia@gmail.com †Deceased

#### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 14 June 2018 Accepted: 20 August 2018 Published: 10 September 2018

#### Citation:

Alomari DZ, Eggert K, von Wirén N, Alqudah AM, Polley A, Plieske J, Ganal MW, Pillen K and Röder MS (2018) Identifying Candidate Genes for Enhancing Grain Zn Concentration in Wheat. Front. Plant Sci. 9:1313. doi: 10.3389/fpls.2018.01313

**8**

hunger (Welch and Graham, 2004), i.e., Zinc (Zn) and Iron (Fe) deficiency, mainly in middle- or lowincome countries where staple crops are the major food source (Sands et al., 2009); recently, the problem was also reported in developed countries (Pandey et al., 2016).

Zn plays significant roles in different metabolic processes and is an essential cofactor for many enzymes and regulatory proteins. The symptoms of insufficient dietary Zn intake for humans can be observed as growth and development retardation, excessive weight loss, diarrhea, and depression (Ozturk et al., 2006; Kambe et al., 2014; Krishnappa et al., 2017). Consequently, improving the nutritional quality of wheat grains by enhancing Zn concentrations is a long-term goal for breeding novel wheat cultivars with a positive effect on grain yield, nutritional quality of the plant, as well as human health (Cakmak, 2008; Genc et al., 2008; Crespo-Herrera et al., 2016).

Since Zn accumulation in grains is a genetically complex trait, genome-wide association study (GWAS) is a powerful tool to detect the genetic factors underlying the natural variation in such complex traits (Hamblin et al., 2011). Several studies identified quantitative trait loci (QTL) for micronutrients, such as Fe and Zn, or macronutrients like Ca in wheat (Morgounov et al., 2007; Tiwari et al., 2009; Crespo-Herrera et al., 2016; Alomari et al., 2017). Peleg et al. (2009) found six QTLs on chromosomes 2A, 2B, 3A, 4B, 5A, 6A, 6B, 7A, and 7B for Zn in a durum wheat × emmer wheat recombinant inbred lines (RILs) population. Four QTLs for grain Zn concentration were identified by Genc et al. (2008) on chromosomes 3D, 4B, 6B, and 7A in a doubled haploid wheat population. Another study mentioned seven QTLs located on chromosomes 1A, 2D, 3A, 4A, 4D, 5A, and 7A for Zn content in wheat grains of which four QTLs are shared with Zn concentration (Shi et al., 2008). Shi et al. (2013) found that chromosome 4D and 5A probably very vital in controlling mineral status in wheat grains.

Previous studies on Zn concentration mainly used bi-parental population, for instance, RIL (Xu et al., 2012; Pu et al., 2014; Srinivasa et al., 2014) but a few studies have used GWAS with high dense single nucleotide polymorphic (SNP) arrays to investigate the genomic regions underlying the accumulation of micronutrients including Zn in the grains of major cereals like wheat (Guttieri et al., 2015). Therefore, understanding the genetic background of Zn accumulation in wheat grains by GWAS provides the basis for devising the plant breeding strategies and for improving the grain Zn status by introducing the putative candidate genes based on the newly available wheat reference (IWGSC RefSeq v1.0) and using advanced bioinformatics tools.

The main goals of this study were (i) to investigate the natural phenotypic variation on grain Zn concentrations for 369 wheat varieties of 3 years field experiments, (ii) to study the genetic architecture of Zn grain concentration by GWAS analysis with three different high dense SNP arrays including 44,233 SNPs providing a high-resolution genetic map, and (iii) to identify the genomic regions and potential candidate genes for consistently significant QTLs.

## MATERIALS AND METHODS

### Plant Material and Field Trials

In this study, we used 369 European elite wheat varieties including 355 genotypes of winter wheat and 14 spring wheat genotypes, originating from Germany, France, Poland, Denmark, Austria, Czech Republic, United Kingdom, Sweden, Switzerland, Hungary, Italy, Belgium, and Netherlands described in (Kollers et al., 2013). Field trials were conducted at IPK, Gatersleben, Germany within 3 years (2014/2015 for 358 genotypes, 2015/2016 for 365 genotypes, and 2016/2017 for 360 genotypes). Few genotypes were missing in each individual year due to poor performance and loss in the field. Each plot size was 2 m × 2 m with six rows spaced 0.20 m apart. Plants were grown in clayey loam soil with phosphorus ranges between 7.1–9.0 µg g −1 and pH ≈ 7 across years. Standard agronomic wheat management practices were applied without using fertilizers to avoid the effect of additional fertilizers on the actual Zn concentrations.

### Wheat Grain Samples Preparation and Milling

The complete panel of genotypes was analyzed for each individual year. For each genotype, thousand kernel weight (TKW) was measured using a digital seed analyzer/counter Marvin (GTA Sensorik GmbH, Neubrandenburg, Germany). Grains were milled using a Retsch mill (MM300, Germany) and the complete panel of the milled samples was dried by incubating overnight at 40◦C.

### Measuring Grain Zinc Concentration

Fifty milligrams of dried and milled wheat grain flour was taken to be digested by (2 ml) nitric acid (HNO<sup>3</sup> 69%, Bernd Kraft GmbH, Germany). The digestion process was performed using a high-performance microwave reactor (UltraClave IV, MLS, Germany). All digested samples were filled up to 15 ml final volume with de-ionized distilled (Milli-Q <sup>R</sup> ) water (Milli-Q Reference System, Merck, Germany). Element standards were prepared from Bernd Kraft multi-element standard solution (Germany). Zinc as an external standard and yttrium (Y) (ICP Standard Certipur <sup>R</sup> Merck Germany) were used as internal standards for matrix correction. Zinc concentrations were measured by inductively coupled plasma optical emission spectrometry (ICP-OES, iCAP 6000, Thermo Fisher Scientific, Germany) combined with a CETAC ASXPRESSTM PLUS rapid sample introduction system and a CETAC autosampler (CETAC Technologies, Omaha, NE, United States).

### Statistical Analysis

The broad-sense heritability (H<sup>2</sup> ) was calculated using the equation:

$$H^2 = \sigma\_G^2 / (\sigma\_G^2 + (\sigma\_e^2 / nE))\tag{1}$$

where σ 2 G is the variance of the genotype, σ 2 e represents the variance of the residual, and nE is the number of the environments.

Analyses of variance (ANOVA) and Pearson's correlation coefficient were calculated for the grain Zn trait across 3 years with Sigma Plot package 13.

Best linear unbiased estimates (BLUEs) based on mixed linear models (MLMs) function with applying the residual maximum likelihood (REML) algorithm were calculated to analyze the phenotypic data and estimate the mean of each individual over the years (Yu et al., 2006). To this end, the genotype term was considered as a fixed effect and we denote year as environment term, which was considered as a random effect. These calculations were accomplished using GenStat v16 software (VSN International, Hemel Hempstead, Hertfordshire, United Kingdom).

### SNP Genotyping and GWAS Analysis

The complete wheat panel consisting of 369 varieties was genotyped by TraitGenetics GmbH, Gatersleben, Germany<sup>2</sup> using two marker arrays: a 90k iSELECT Infinium array (Wang et al., 2014) and a 35k Affymetrix-SNP array (Axiom <sup>R</sup> Wheat Breeder's Genotyping Array<sup>3</sup> ; Allen et al., 2017). Additionally, a novel 135k Affymetrix array designed by TraitGenetics was used to genotype a subpanel of 183 genotypes from the complete panel of genotypes (Zanke et al., 2017). For the reference map, the ITMI-DH population (Sorrells et al., 2011; Poland et al., 2012) was used to anchor the SNP-markers of the 90k and 35k arrays. The 135k array markers were genetically mapped on four different F2 populations and then physically anchored on the reference sequence RefSeq v1.0 of hexaploid wheat<sup>4</sup> from International Wheat Genome Sequencing Consortium (IWGSC). For SNP markers quality control, we applied a minor allele frequency (MAF) ≤ 3% (equaling 11 varieties out of 369) with rejecting SNPs having missing values or heterozygosity ≥ 3%, resulting in 7,761 mapped polymorphic SNP markers from the 90k iSELECT, 7,762 SNPs from the 35k Affymetrix-SNP, and 28,710 from the 135k Affymetrix, which were used for association analysis. The investigated genotype panel and its population structure were described in a previous study by Kollers et al. (2013).

Association mapping based on a MLM was conducted primarily using the Genome Association and Prediction Integrated Tool (GAPIT; Lipka et al., 2012) in R: a language and environment for statistical computing. It includes the phenotypic data with SNP markers coming from the high-density arrays. We incorporated PCA for population correction and stratification. For significant markertrait associations (MTAs) detection, we set a threshold P-value of −log<sup>10</sup> (P) ≥ 3. Quantile-quantile plots were drawn based on the observed and expected −log<sup>10</sup> (P) values. Explained phenotypic variance (R 2 ) and marker effects (positive/negative) were extracted from GWAS results.

### Connecting Significant SNPs With the Physical Sequence of Wheat

The flanking sequence of significant SNP markers defining significant associations with the grain Zn concentration trait was obtained from the wheat 90k database (Wang et al., 2014), 35k database<sup>5</sup> and 135k Affymetrix array (unpublished data, TraitGenetics). These flanking sequences were blasted by Galaxy software, which is an IPK-internal web-based platform<sup>6</sup> by using megablast to fetch the whole sequence of the genomic region of interest based on IWGSC RefSeq v1.0. The extracted sequences were submitted to the annotation pipeline MEGANTE<sup>7</sup> in order to identify potential candidate genes and their gene ontologies.

### RESULTS

### Natural Phenotypic Variation of Grain Zn Concentrations in Two Wheat Panels

Zn measurements were obtained from grain samples of 369 European wheat varieties, which were grown under field conditions in three consecutive years (2015, 2016, and 2017). Zn concentrations of each individual wheat genotype for the complete panel of 369 genotypes and for the subpanel with 183 genotypes are presented in **Supplementary Table S1**. The phenotypic distribution of the Zn concentrations in the individual years appeared to be normally distributed (**Supplementary Figure S1**). A wide range of variation in the Zn concentration was observed for the complete panel (**Figure 1A**) and the subpanel (**Figure 1B**) in all 3 years and most of the variation within the complete panel was also captured in the subpanel (**Figure 2A** and **Table 1**). The results of BLUEs across 3 years' data ranged from 25.05 to 52.67 µg g−1DW with a mean of 34.92 µg g−<sup>1</sup> DW. The genotype "Haven" had the highest Zn concentration equaling 52.67 µg g−<sup>1</sup> DW in the complete panel of wheat grain genotypes based on the BLUEs (**Figure 2B**). A significant positive Pearson's correlation ranging from r = 0.18 to 0.39 (P < 0.001) among the years (**Figure 2C**) indicated a relatively stable measurement of the phenotypes. A significant positive Pearson's correlation was found between Zn and TKW in all 3 years (**Supplementary Figure S2**). The broad-sense heritability for Zn concentration across the years was H<sup>2</sup> = 0.54. The results of ANOVA for Zn concentration indicated significant effects of genotype and environment, i.e., years (**Supplementary Table S2**).

### Association Mapping of Grain Zn Concentrations in Two Diverse Wheat Panels

Genome-wide association mapping was performed for the complete panel and subpanel of wheat genotypes with Zn concentration data for each individual year in addition to BLUEs, using the implemented MLM with applying principal component

<sup>2</sup>http://www.traitgenetics.com

<sup>3</sup>http://www.cerealsdb.uk.net/

<sup>4</sup>https://urgi.versailles.inra.fr/WheatMine/begin.do

<sup>5</sup>http://www.cerealsdb.uk.net

<sup>6</sup>http://www.galaxyproject.org/

<sup>7</sup>https://megante.dna.affrc.go.jp/

analysis (PCA) as a correction factor for population structure. The complete panel of wheat genotypes was analyzed by a combination of markers from the 90K iSELECT INFINIUM array and the 35K Affymetrix array resulting in 15,523 polymorphic SNP markers which were anchored in a genetic reference map. The subpanel was analyzed by merging 90K iSELECT array, 35K and 135k Affymetrix arrays resulting in a total of 44,233 polymorphic SNP markers based on their physical locations in order to increase the density of markers, achieve good mapping resolution, and to further enhance the power of GWAS output within the germplasm panel. Significant MTAs were detected above the threshold of –log10 (P-value) ≥ 3 as shown in Manhattan plots for both panels (**Figures 3A**, **4A**). The GWAS results were presented along with the QQ plots for SNPs, revealing that the distributions of observed association P-values were close to the distribution of expected associations (**Figures 3B**, **4B**). A total of 40 MTAs were detected in the complete panel on chromosomes 2A, 3A, 3B, 4A, 4D, 5A, 5B, 5D, 6D, 7A, 7B, and 7D with R 2 -values ranging from 2.5 to 5.2%. A total of 21 MTAs had positive effects related to the minor allele and 19 MTAs had negative effects (**Supplementary Table S3**). While most MTAs were only detected in 1 year, an MTA on chromosome 3B was detected in all 3 years in similar mapping locations of 64.5 to 66.8 cM. The most significant MTA was detected on chromosome 5A with −log (p) value equaling 4.87 in the genomic region of 114.5 cM and explaining an R 2 value of 5.2%. The number of MTAs in the subpanel was increased to 161 including 31 unmapped markers on chromosomes 1A, 1B, 2A, 2B, 3A, 3B, 3D, 4A, 4D, 5A, 5B, 6A, 6B, 7A, and 7B with R 2 -values ranging from 5.5 to 13.7% (**Supplementary Table S4**). A genomic region on chromosome 3B between the physical location of 716,993,339 and 736,712,355 (IWGSC RefSeq v1.0) is defined by 26 MTAs in the years 2016, 2017 and BLUEs with the highest R <sup>2</sup> of 11.3% at AX-95129199. A continuous range of 27 significant MTAs was detected on chromosome 5A ranging from physical location 353,989,023–698,510,016 including all 3 years and BLUEs. The most significant marker AX-158550766 located at position 464,479,275 explained 12.3% of phenotypic variation. A total of six markers for chromosome 3B (64.5–66.8 cM) and two markers for chromosome 5A (98.1–114.5 cM) were shared between the complete panel of varieties and the subpanel.

### Defining Physical Regions of Candidate Genes Underlying Zn Accumulation in Wheat Grains

The highly significant SNP markers that located on chromosome 3B and 5A (**Figure 5**) were selected for BLAST analysis,

TABLE 1 | Grain Zn concentration mean, median, minimum, and maximum values within the complete and subpanel of wheat genotypes for each individual year.

the top five genotypes with the highest Zn concentration value based on BLUE values. (C) Person correlation between years.


using the web-based platform Galaxy<sup>6</sup> . The physical region of these SNPs at chromosome 3B located between 723,504,241 to 723,611,488 bp and for 5A on 462,763,758 to 466,582,184 bp (**Figure 5**) that were queried against IWGSC RefSeq v1.0. The fetched sequence output from Galaxy was submitted to MEGANTE<sup>7</sup> , which is a web-based system for integrated plant genome annotation to perform genome annotations. On chromosomes 3B and 5A, we found a number of genes encoding proteins with known functions and others reported as hypothetical proteins (**Supplementary Table S5**). Putative candidate genes based on their function included a transcription factor (TF) belonging to the basic leucine zipper (bZIP) family and the TF bHLH76, a homeobox-leucine zipper protein HOX4, a SWAP (suppressor-of-white-apricot)/surp domaincontaining protein and several genes related to the mitogenactivated protein kinase (MAPK) gene family (**Table 2**). Thus, we conclude that these two genomic regions on chromosomes 3B and 5A harbor a number of putative candidate genes, which may have a significant role in the process of grain Zn accumulation.

### DISCUSSION

P-values at −log10 (P).

### Wide Variation for Zn Accumulation in Wheat Grains

The poor bioavailability of essential nutrients in cereal grains leads the breeders to use plant breeding which is a seed-based approach to develop cultivars with improved and adequate levels of nutrients (Tiwari et al., 2016). Plant breeding or genetic biofortification was found to be comparative with other costly and non-sustainable approaches such as agronomic biofortification which is based on using fertilizers or other approaches that are based on food fortification and daily consumed supplementations (Garcia-Oliveira et al., 2018). Therefore, genetic biofortification is considered as one of the vital approaches that can help to overcome malnutrition problems either by classical plant breeding or approaches involving GMOs (genetically modified organisms) (Borrill et al., 2014; Singh et al., 2017). Many reports mentioned that the targeted range for biofortified grains and to develop cultivars with high Zn concentration is between 40–50 µg g−<sup>1</sup> (Howarth et al., 2011; Cakmak and Kutman, 2017). The phenotypic variation that found in our germplasm ranged from 25.05–52.65 µg g−<sup>1</sup> which is compatible with the target range and provides the chance to use the highest grain Zn-containing genotypes in breeding programs. Similar Zn concentration ranges were also reported by Graham et al. (1999) and Guttieri et al. (2015) who found that Zn concentrations in 132 bread wheat genotypes ranged between 25–53 and 13.1–45.2 µg g−<sup>1</sup> in hexaploid wheat. Additionally, in

durum wheat, the variation ranged from 24.8–48.8 µg g−<sup>1</sup> for Zn which is comparable with our observations (Magallanes-Lopez et al., 2017).

Grain Zn concentrations across years were weakly to moderately correlated (r = 0.18–0.39; P < 0.001) which may be attributed to environmental effects across years and its interaction with genotypes as was also reported for grains of other crops (Singh et al., 2017). The calculated heritability (H<sup>2</sup> = 0.54) for our trait of interest represents a moderate contribution of the genotype to the overall variation in grain Zn concentration, which was also affected by the environment in experiments being conducted across 3 years. Similarly, Tiwari et al. (2016) and Khokhar et al. (2018) found the moderate effect of genotypes on wheat grain Zn concentrations. We observed a significant positive correlation between Zn and TKW, which implies that both traits improve simultaneously each other and this observation has also been made in other studies with wheat (Morgounov et al., 2007; Peleg et al., 2009; Krishnappa et al., 2017). Our findings provide a list of improved cultivars with high Zn concentration that can be utilized in future breeding programs for boosting grain quality.

### Zn Grain Concentration as a Complex Trait

Genetic dissection for grain Zn concentration in our diversity panel showed that this trait is under control of many genetic loci. The constant significant MTAs across the years 2015, 2016, 2017 and BLUE values are conferred by loci



on chromosomes 3B (723,504,241–723,611,488 bp) and 5A (462,763,758–466,582,184 bp) in the complete panel as well as in the subpanel of wheat genotypes (**Figure 5**). Previously, a QTL for Zn concentration was reported in a similar location on chromosome 3B by Crespo-Herrera et al. (2017) in a population of hexaploid wheat RILs. Another study detected on chromosome 5A a QTL for grain Zn concentration in a RIL population derived from a cross between durum wheat and wild emmer wheat (Peleg et al., 2009). A previous study in rice mentioned that many QTLs for grain Zn have been mapped based on eight different mapping populations, where the most constant QTL for grain Zn content across environments was located on chromosome 12 (Swamy et al., 2016), which has synteny to chromosome 5A in wheat (Salse et al., 2009). Therefore, our results indicate potential genomic regions controlling Zn in wheat that can be used in further genetic investigations.

## Identification of Candidate Genes

The gene content of the two genomic regions on chromosomes 3B and 5A harbors many hypothetical and functionally annotated genes or proteins including TFs and transporter proteins (**Supplementary Table S5**). For instance, we found five genes on chromosome 3B related to the MAPK family (**Table 2**) and this gene is well documented in biotic and abiotic stress signaling (Xu and Zhang, 2015). Recently, several publications reported that different MAPK genes play major roles in sugar, nitrogen, phosphate, iron, potassium, or Zn signaling pathways (Lastdrager et al., 2014; Briat et al., 2015; Chardin et al., 2017), which makes them promising candidates for being involved in grain Zn accumulation. The gene annotations of the MEGANTE pipeline showed that one of the MAPK-related genes encoded a vacuolar protein sorting-associated protein. Interestingly, a recent report showed that vacuolar protein sorting-associated protein was

identified as one of the candidate genes mediating elevated Zn concentrations in chickpea seeds (Upadhyaya et al., 2016). In the same study, a SWAP/surp domain-containing protein was reported to be linked with seed Zn concentration in chickpea and a SWAP was found in the present study as putative candidate gene on chromosome 3B (**Table 2**).

On chromosome 5A, a homeobox-leucine zipper protein HOX4 that annotated as TaHDZIP1 was found to be associated with grain Zn concentrations in the used panel. Another regulatory element detected in this genomic region is the putative TF bHLH76 and it has been reported that bHLH is one of the binding factors of the cis-element G-box which was found in promoter regions of all TaMTPs (metal tolerance proteins), which are involved in trace metal homeostasis and have a potential role in cereal grain biofortification with essential micronutrients including Zn (Menguer et al., 2017; Vatansever et al., 2017). Additionally, we found that chromosome 5A harbored a TF belonging to the bZIP (basic-region leucine-zipper) family which have a crucial role in nutrient and Zn homeostasis (Ishimaru et al., 2011; Evens et al., 2017; Cifuentes-Esquivel et al., 2018). In Arabidopsis thaliana, the TFs bZIP19 and bZIP23 were shown to regulate the adaption to Zn deficiency in roots (Assunção et al., 2010; Inaba et al., 2015). A total of 187 TabZIP genes have been identified in wheat (Li et al., 2015) and a specific group of TabZIP genes conferred functional complementation of Zn deficiency-hypersensitive such as bzip19 bzip23 (Evens et al., 2017; Henríquez-Valencia et al., 2018). So far, most functional studies of bZIPs were related to roots or leaves while little information about bZIP-dependent regulatory mechanisms is available for grains. Therefore, novel bZIP genes could play a critical role in improving Zn accumulation in grains. However, this requires further genetic and functional validation. Finally, the FAR1 protein detected on chromosome 5A (**Supplementary Table S5**) and its molecular function based on gene ontology analysis is related to zinc ion binding (**Table 2**), which also makes it a potential candidate gene.

### CONCLUSION

The present analysis showed the power of the GWAS approach for identifying putative candidate genes for grain Zn accumulation in wheat. This study discovered genetic factors controlling grain Zn accumulation that may establish

### REFERENCES


the basis for further breeding and genetic work in cereals. Two physically anchored chromosomal segments 3B and 5A harbor many putative candidate genes like MAPK and bZIP genes which are proposed as candidates conferring enhanced grain Zn concentrations. Further validation and functional characterization are required to elucidate the role of these genes for Zn homeostasis in wheat.

### AUTHOR CONTRIBUTIONS

DA performed the data analysis including genome-wide association scan, candidate genes identification, and statistical analysis. KE and NvW participated in Zn concentration measurements. AA helped in manuscript modification and statistical analysis. KP and MR designed the experiment. MR conceived the idea and participated in the interpretation of results. DA and MR wrote the manuscript. All authors read and approved the final manuscript.

### FUNDING

The project was funded by the internal financial support of IPK Gatersleben.

### ACKNOWLEDGMENTS

We thank Ellen Weiß and Yudelsy Antonia Tandron Moya for excellent technical assistance. The genotyping data were created in the frame of the projects VALID and SELECT (project numbers 0314947 and 0315949) funded by the Plant Biotechnology Program of the German Federal Ministry of Education and Research (BMBF). We are grateful for the IWGSC for pre-publication access to IWGSC RefSeq v1.0 for data analysis during the development of this manuscript.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2018.01313/ full#supplementary-material



FAOSTAT (2016). Crop Statistics. Available at: http://www.fao.org/faostat



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Alomari, Eggert, von Wirén, Alqudah, Polley, Plieske, Ganal, Pillen and Röder. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# A Genome-Wide Association Study of Wheat Spike Related Traits in China

Jing Liu, Zhibin Xu, Xiaoli Fan, Qiang Zhou, Jun Cao, Fang Wang, Guangsi Ji, Li Yang, Bo Feng\* and Tao Wang\*

Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, China

Rapid detection of allelic variation and identification of advantage haplotypes responsible for spike related traits play a crucial role in wheat yield improvement. The released genome sequence of hexaploid wheat (Chinese Spring) provides an extraordinary opportunity for rapid detection of natural variation and promotes breeding application. Here, selection signals detection and genome-wide association study (GWAS) were conducted for spike related traits. Based on the genotyping results by 90K SNP chip, 192 common wheat samples from southwest China were analyzed. One hundred and forty-six selective windows and one hundred and eighty-four significant SNPs (51 for spike length, 28 for kernels per spike, 39 for spikelet number, 30 for thousand kernel weight, and 36 for spike number per plant) were detected. Furthermore, tightly linkage and environmental stability window clusters and SNP clusters were also obtained. As a result, four SNP clusters associated with spike length were detected on chromosome 2A, 2B, 2D, and 6A. Two SNP clusters correlated to kernels per spike were detected on 2A and 2B. One pleiotropy SNP cluster correlated to spikelet number and kernels per spike was detected on 7B. According to the genome sequence, these SNP clusters and their overlapped/flanking QTLs which have been reported previously were integrated to a physical map. The candidate genes responsible for spike length, kernels per spike and spikelet number were predicted. Based on the genotypes of cultivars in south China, two advantage haplotypes associated with spike length and one advantage haplotype associated with kernels per spike/spikelet number were detected which have not been effectively transited into cultivars. According to these haplotypes, KASP markers were developed and diagnosed across landraces and cultivars which were selected from south and north China. Consequently, KASP assay, consistent with the GWAS results, provides reliable haplotypes for MAS in wheat yield improvement.

Keywords: wheat, spike length, kernels per spike, spikelet number, artificial selection, GWAS, haplotype, KASP

### INTRODUCTION

Bread wheat (Triticum aestivum L.) is the most widely grown food crop and provides the main energy requirements for about one third of the global people (Guo et al., 2018). As the world population growing continuously, yield improvement is an on-going task for wheat breeding. Three key components, spike number, kernels per spike (KPS), and thousand kernel weight (TKW), collectively determine the wheat yield. Furthermore, spike length (SL) and spikelet number (SN) which affect KPS and spike number per plant (SNPP) also play important role in improving wheat

#### Edited by:

Dragan Perovic, Julius Kühn-Institut, Germany

#### Reviewed by:

Ahmad M. Alqudah, Leibniz-Institut für Pflanzengenetik und Kulturpflanzenforschung (IPK), Germany Behnaz Soleimani, Institut für Resistenzforschung und Stresstoleranz (RS), Julius Kühn-Institute, Germany

\*Correspondence:

Bo Feng fengbo@cib.ac.cn Tao Wang wangtao@cib.ac.cn

#### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 01 August 2018 Accepted: 11 October 2018 Published: 31 October 2018

#### Citation:

Liu J, Xu Z, Fan X, Zhou Q, Cao J, Wang F, Ji G, Yang L, Feng B and Wang T (2018) A Genome-Wide Association Study of Wheat Spike Related Traits in China. Front. Plant Sci. 9:1584. doi: 10.3389/fpls.2018.01584

**19**

yield (Guo et al., 2017; Liu K. et al., 2018). Therefore, discovering the crucial SNPs/quantitative trait loci (QTLs) associated with spike related traits is the urgent task for further wheat breeding program.

Conventional wheat breeding (artificial selection) is mainly based on phenotypic selection which is one of the most important steps for genetic improvement. However, it is blindness, empirical, inefficiency and costs a long time. Fortunately, the hardworking and intelligence of the breeders would have kept signatures in the wheat genome during crop improvement, and this selection signal could be detected using different methods (Gao et al., 2017). Identification of selection signal has been reported in many plants, such as soybean (Zhou et al., 2015; Wang J. et al., 2016), rice (Xu et al., 2012), peach (Cao et al., 2016), tomato (Lin et al., 2014), and wheat (Cao et al., 2016). In consideration of selection signal could not associated with phenotype, GWAS would be conducted to annotate the signatures in depth (Zhou et al., 2015). Based on the linkage disequilibrium (LD), GWAS is effective method to explore complex quantitative trait loci and allelic variation for a particular trait. SNPs with more high association scores were more likely to be close to the candidate genes, which mean the possibility by using GWAS to detect candidate genes (Brachi et al., 2011). GWAS has been widely used in crops to predict phenotypic related candidate genes (Si et al., 2016; Wang X. et al., 2016; Liu J. et al., 2018).

Compared to the rice and Arabidopsis, geneticist and breeders have been looking forward to the wheat genome sequence for 14 years (Kaul et al., 2000; Goff et al., 2002). The annotated reference wheat genome, IWGSC RefSeq v1.0, released on the IWGSC website (http://www.wheatgenome.org/) opened a new avenue in exploring genome sequences, isolating novel genes and rapidly detecting natural variations. Consideration of the forward genetics, IWGSC RefSeq v1.0 with precise scaffold ordering and annotation, integrated assembly and complete gene models, could fully resolve problems (Appels et al., 2018). Take GWAS for example, candidate genes can be easily obtained around the leading SNP based on the LD decay. According to the annotation of the wheat reference genome, candidate genes could be further screened. Then, the function of targeted genes can be confirmed by gene editing, and the advantage genotype could be used for selection breeding directly (Appels et al., 2018).

Genes and QTLs associated with spike related traits spread all over the 21 chromosomes of wheat (Liu K. et al., 2018). The Q gene located on chromosome 5A confers a free threshing spike and pleiotropically influences many other domesticationrelated traits such as spike length, plant height, and spike emergence time (Simons et al., 2006). The TaSnRK2 gene encoded sucrose non-fermenting 1-related protein kinase was detected on chromosome 4A/4B/4D. It plays crucial roles in response to various environment stimuli and shows significant correlation to spike length and thousand kernel weight (Miao et al., 2017; Zhang et al., 2017). AGO1d gene mutant in a tetraploid durum wheat produced shorter spikes and fewer kernels per spike than wild-type controls (Feng et al., 2017). The compactum (C) gene locates on the long arm of chromosome 2D near the centromere and affects spike compactness, grain size, grain shape and grain number per spike (Johnson et al., 2008). TaCKX6-D1 was found to be significantly associated with TKW and KPS by controlling cytokinin levels (Zhang L. et al., 2012). Also, TaSAP1 was confirmed to be associated with TKW and KPS by involving in response to stresses (Chang et al., 2013). Ppd-1 on 2D was identified as an inhibitor of paired spikelet formation by regulating the expression of FT gene, consequently decreased the number of spikelet (Boden et al., 2015). TaMOCI-7A and TaTEF-7A also have been found to be stably associated with spikelet number per spike (Zheng et al., 2014; Zhang et al., 2015). Tiller inhibition gene (tin) mapped on the short arm of chromosome 1A and productive tiller number gene (PTN) were notedly associated with spike number per plant (Spielmeyer and Richards, 2004; Naruoka et al., 2011). The classical grain weight related genes, such as TaGW2-A1, TaTGW6-A1, TaCwi, TaGS5-A1, TaGS-D1, TaSus1 and TaSus2 were located on 6A, 3A, 2A/4A/5D, 3A, 7D, and homoeolouous groups 7 and 2 (Jiang et al., 2011, 2015; Su et al., 2011; Hou et al., 2014; Rasheed et al., 2014; Zhang et al., 2014; Wang et al., 2015; Hanif et al., 2016; Zhai et al., 2018). In addition, a lot of QTLs associated with spikerelated traits (Huang et al., 2006; Naruoka et al., 2011; Cui et al., 2014, 2017; Azadi et al., 2015; Fan et al., 2015; Gao et al., 2015; Li et al., 2016; Luo et al., 2016; Zhai et al., 2016; Guo et al., 2017; Liu et al., 2017; Lozada et al., 2017; Mwadzingeni et al., 2017; Ogbonnaya et al., 2017; Schulthess et al., 2017; Shi et al., 2017; Sun et al., 2017; Xu et al., 2017; Zhou et al., 2017), have been reported in previous studies. However, few genes or QTLs associated with spike related traits has been used for wheat breeding.

In this study, 192 wheat lines were genotyped by using the 90K Illumina iSelect SNP Array (Wang et al., 2014). Based on multi-environmental trial data, GWAS were conducted to identify favorable SNP clusters for yield-related traits, such as spike length, spikelet number, kernels per spike, thousand kernel weight and spike number per spike. SNPs overlapping these haplotypes were used to develop KASP (Kompetitive Allele Specific PCR) markers, which were subsequently validated on a large sample panel. These KASP markers could ultimately used in genome selection for wheat yield improvement.

### MATERIALS AND METHODS

### Plant Material and Phenotype Analysis

The natural population used for GWAS including 25 synthetic hexaploid wheat lines, 80 landraces and 87 cultivars (**Supplementary Table 1**). They were planted in four environments: 2014–2015 in Shuangliu (E4); 2015–2016 in Shuangliu (E3); 2015–2016 Shifang with high nitrogen treatment (E2); 2015–2016 Shifang with low nitrogen treatment (E1). In the high nitrogen field, 60 kg N ha−<sup>1</sup> were applied after

**Abbreviations:** ANOVA, Analysis of Variance; CTAB, Cetyl Trimethyl Ammonium Bromide; CMLM, Compressed Mixed Linear Model; GWAS, Genome-Wide Association Study; HWE, Hardy-Weinberg Equilibrium; KASP, Kompetitive Allele Specific PCR; KPS, Kernels per Spike; LD, Linkage Disequilibrium; MAF, Minor Allele Frequency; MAS, Marker-Assisted Selection; QTL, Quantitative Trait Locus; SL, Spike Length; SN, Spikelet Number; SNP, Single Nucleotide Polymorphism; SNPP, Spike number per plant; TKW, Thousand kernel weight.

sowing. In the low nitrogen field, no nitrogen was applied during the whole growing period. The natural population used for KASP assay including 135 landraces and 141 cultivars were planted in the 2017–2018 growing seasons in Shuangliu (**Supplementary Table 2**). Wheat lines were planted in a randomized complete block with three replications per location. Every block had two rows, with a single row of 1.2 m long and 0.2 m apart. Twelve seeds were hand-planted in each row. Crop management followed local agricultural practice. After sowing, approximately 40 kg N ha−<sup>1</sup> were applied except the high and low nitrogen fields. Fungicide was applied at seedling stage and heading period to control diseases and pests, but no irrigation was used.

Main spikes of six plants were randomly selected for phenotype analysis. Spike length was measured from the base of the rachis to the topmost spikelet, excluding the awns. Spikelet number was counted from the basal sterile spikelet to the top fertile spikelet. Kernels per spike were estimated by hand-threshing the maturity spike. Thousand kernel weight was measured by SC-E software (Handzhou Wanshen Detection Technology Co., Ltd., Hangzhou, China) from weighting more than 200 random kernels with two technical repeats. Spike number per plant was counted the spike with more than five kernels in one plant. Statistical data were analyzed using SPSS 19.0 software (https://www.ibm.com/analytics/cn/zh/ technology/spss/). Outliers were deleted before analysis. Analysis of variance (ANOVA) was performed to test the differences caused by the influence of genotype and environment for each spike trait. Pearson's correlation analyses were conducted to pinpoint the relationships among the spike related traits.

### SNP Calling and LD Estimation

Genomic DNA was extracted from the fresh leaves of seedling wheat using the modified cetyl trimethyl ammonium bromide (CTAB) method (Murray and Thompson, 1980). At least 5 µg genomic DNA of each line was used for genotyping by the wheat 90K Illumina iSelect SNP Array at the Compass Biotechnology Co., Ltd. After quality control (filter criteria: sample call rate > 0.8, MAF > 0.05, SNP call rate > 0.9, HWE < 0.000001), 13,154 polymorphic SNPs were selected for follow-up analysis.

Based on the pruned data, linkage disequilibrium (LD) was calculated using PLINK60 (Version 1.90) software and the LD decay graphs were plotted using an R script. The pairwise r 2 (squared allele frequency correlation) values were calculated using SNPs within 200 Mb for cultivars, landraces and all samples, respectively. The distance that the LD decays to half of its maximum value was estimated.

### Detection of Artificial Selection Signals

The genetic differentiation (FST) and reduction of nucleotide diversity (ROD = 1–πcultivar/πlandrance) were calculated for non-overlapping 100-kb sliding windows across the genome using VCFtools v0.1.14 (https://github.com/vcftools/vcftools). The windows in the upper 99% of the pool's empirical distribution for both FST statistics and ROD values were selected as candidate regions. These genes, located in the windows detected by two methods or existed in the window clusters according to the distance less than 1 Mb based on the LD, were chosen as candidate genes. The annotation information of these candidate genes were extracted from the IWGSC website (http://www.wheatgenome.org/). Meanwhile, Arabidopsis (https://www.arabidopsis.org/) and rice (http:// rapdb.dna.affrc.go.jp/) functional gene databases were used for annotation.

### Genome-Wide Association Study

GWAS for spike related traits were performed in 192 wheat lines using the compressed mixed linear model (CMLM) by the GAPIT package, which took the results of population stratification and kinship as covariate to minimize false positives (Lipka et al., 2012). A threshold P-value of 0.001 (–log10P = 3) was used to declare significant SNPs for GWAS results. To uncover the candidate clusters, stable SNPs (based on LD decay) existed in more than two environments underlying association signals were selected.

### An Integrated Map Based on the Wheat Genome Sequence

The stable SNP clusters were mapped on the wheat genome. Together with previous report QTLs, regions of interest (spike length, spikelet number, kernels per spike) were positioned onto the newly released reference genome sequence of Chinese Spring by blasting their flanking or peaking marker sequences against the IWGSC RefSeq v1.0 (https://urgi.versailles.inra. fr/blast\_iwgsc/blast.php). MapChart Ver. 2.3 was used for map drawing (https://www.wageningenur.nl/en/show/Mapchart. htm). (Voorrips, 2002)

### KASP Assay

Based on LD decay, KASP markers were developed by using the SNPs overlapping these candidate haplotypes. A total of 276 wheat lines were genotyped, including 146 landraces (60 southern lines and 86 northern lines) and 130 cultivars (67 southern lines and 73 northern lines), collected from diverse wheat zoning in China (**Supplementary Table 2**). In the southern wheat lines, 39 landraces and 49 cultivars have been used for GWAS. Four pairs of KASP markers were developed for detection three haplotypes (**Table 1**). The KASP assay was carried out according to the manufacturer's recommendation (LGC Genomics, Beverly, MA, USA) and the reference Patterson et al. (2017). Amplification was carried out starting with 15 min at 94◦C, followed by 10 touchdown cycles of 20 s at 94◦C and 60 s at 65–57◦C, and 26–35 cycles of 94◦C for 20 s and 60◦C for 1 min. End point genotyping was done using the CFX Manager 3.1 software. The specificity and sensitivity of all tested markers were listed on **Supplementary Table 3**.

### RESULTS

### Phenotypic Assessment

One hundred and ninety-two bread wheat lines including synthetic hexaploid, landraces and cultivars were tested in the present study. The phenotypic performance of the investigated traits for the wheat natural population in four environments


**22**


was shown in **Supplementary Figure 1** , **Supplementary Table 4** . The coefficients of variation for these traits in each environment ranged from 10.63 to 41.07%, indicating broad phenotypic variation and a large improvement potential. Significant differences were detected among environments for these traits by ANOVA. (**Supplementary Tables 4** , **5**). The variation of SL, SN KPS and SNPP was prominently impacted by environments, which explained 19.35, 6.96, 7.18, and 32.18% of the phenotypic variation, respectively (**Supplementary Table 5**). Significant Person's correlation coefficients were found among these traits in entire wheat lines (All) and different groups (SH, L, and C) ( P < 0.05, **Supplementary Table 6**). A significant negative correlation was observed between TKW and SNPP in the entire wheat lines. However, TKW and SL were found significant positive correlated in the entire wheat lines. Interestingly, the significant negative correlations were observed between TKW and SN, TKW and KPS in the entire wheat lines, but significant positive correlations were found in cultivars. Here, the selective pressure may play an important role for the change of correlation from landraces to cultivars (Zhang D. et al., 2012). Significant positive correlation was detected between SN and KPS in all the groups. However, significant positive correlations were only observed between SL and SN, SL and KPS in cultivars.

### Artificial Selection Signals During Wheat Improvement

Wheat lines used in this study include synthetic hexaploid, landraces and cultivars. No obvious population structure among these samples was detected which has been proved by the previous study (Liu K. et al., 2018). Linkage disequilibrium (LD) analysis was performed in cultivars, landraces and all samples, respectively. Compared to a higher LD dropped to half of its maximum value of cultivars (1,053 kb), the value of landraces was lower (785 kb) (**Supplementary Figure 2**). The higher LD value in cultivated wheat is consistent with the fact that the effect of artificial selection exists in this population.

To identify potential artificial selection signals at the genomic level, genetic differentiation ( FST) and polymorphism levels (ROD) between cultivars and landraces were calculated (**Figure 1**). The results show that there were 75 and 71 nonoverlapping windows detected to be potential selective sweep regions, respectively (**Supplementary Table 7**). These selective windows accounted for only 0.8% of the whole wheat genome and were not covered all the 21 chromosomes. Fortunately, 10 windows were detected by the two methods. In addition, 19 window clusters were obtained according to the distance less than 1 Mb based on the LD which might be genetic linkage regions to some loci affecting important agronomic traits.

In our analysis, 146 putative selection sweeps were compared with previously reported spike related QTLs or markers based on the LD decay. As a result, 81 selective sweeps were located within the known spike related QTLs or markers (**Figure 1** , **Supplementary Table 8**). Primarily, 26 TKW related QTLs and one gene were found in the overlapped regions of 37 selective sweeps. Then, 35, 23, and 22 selective windows overlapped with 18 SL related QTLs, 15 KPS related QTLs and 9 SN related

TABLE

1


Continued

QTLs, respectively. Only 8 sweeps overlapped with 5 SNPP related QTLs. Notably, 2 window clusters on chromosome 2A and 6B were also discovered by the two methods. The window cluster on 2A overlapped with the previously reported Rht gene TaUBP24 (Liu J. et al., 2018). Meanwhile, the window cluster on 6B located in the confidence interval of previously reported QTLs for thousand kernel weight and spike length (Wang et al., 2011; Mir et al., 2012).

### GWAS

The five spike related traits in four environments were used to perform GWAS. QQ-plots and Manhattan plots of the GWAS results are shown in **Figures 2A–G**, **Supplementary Figures 3**– **7**. Fifty-one, Twenty-eight, Thirty-nine, Thirty, and Thirty-six significant SNPs were detected for SL, SN, KPS, TKW, and SNPP in all environments, respectively (**Supplementary Tables 9**–**13**). Among these significant SNPs, 18 loci were detected in two or more environments. Meanwhile, 31 SNP clusters (two more SNP in one LD decay distance) were found, and five SNPs in one cluster were multi-trait loci. Clusters contained four more SNPs or multi-environment loci as candidates were further studied. Therefore, four SNP clusters correlated to spike length located on chromosome 2A, 2B, 2D, 6A, and three SNP clusters associated with kernels per spike (including one multi-trait loci which also correlated to spikelet number) located on chromosome 2A, 2B, 7B were analyzed (**Table 1**).

According to the types of these SNP clusters, haplotypes associated with spike related traits were detected. Meanwhile, the frequency of haplotypes distributed in germplasm was analyzed (**Figure 3**, **Supplementary Figure 8**). Two haplotypes associated with SL were identified for each cluster on chromosome 2A,

FIGURE 2 | GWAS results for spike length and kernels per spike. (A–D) Q-Q plot and Manhattan plot of SNPs associated with spike length and kernels per spike in two environments. (E–G) Manhattan plots of the SNP clusters on chromosomes 2B, 6A, and 7B. The SNP clusters on 7B represented cKPS/SN-7B. The dashed horizontal line depicted a significant threshold level. (H–J) The integrated physical map of SNP clusters and reported QTLs. The short arms of the chromosomes are located at the top. The physical positions of the marker loci are listed on the left side of the corresponding chromosomes. The names of the marker loci and QTLs are listed on the right side of the corresponding chromosomes. Red bar: SNP clusters; green bar: selection regions; black bar: reported QTLs.

2B, and 2D, respectively. Hsl-2A-2 and Hsl-2B-2, the advantage haplotypes, show significant longer spike than that of Hsl-2A-1 and Hsl-2B-1. However, the spike length of the two haplotypes Hsl-2D-1 and Hsl-2D-2 did not display statistically difference. Hsl-2A-1 contained 42 landraces and 86 cultivars, while the advantage haplotype Hsl-2A-2 included 38 landraces and only one cultivar. Similarly, 46 landraces and 79 cultivars belonged to Hsl-2B-1, while the advantage haplotype Hsl-2B-2 contained 34 landraces and 8 cultivars. Four associated haplotypes were found for spike length related cluster on 6A (Hsl-6A-1∼4). Hsl-6A-4 showed significant longer spike than that of Hsl-6A-1/2/3. The advantage haplotype (Hsl-6A-4) included all the cultivars (87) and more than half of the landraces (57). Meanwhile, 15, 3, and 5 landraces belong to Hsl-6A-1, Hsl-6A-2, and Hsl-6A-3, respectively. Three haplotypes associated with KPS were detected for clusters on 2A and 2B, respectively. Hkps-2A-3 shows significant more kernels per spike than that of Hkps-2A-1. However, the three haplotypes of Hkps-2B did not show significant difference. Hkps-2A-1 contained 9 landraces and 18 cultivars. Only 2 landraces and 3 cultivars belonged to Hkps-2A-2. The advantage haplotype, Hkps-2A-3, contained 69 landraces and 66 cultivars. The multi-trait cluster including two associated haplotypes with KPS/SN was detected on 7B (Hkps/sn-7B-1, Hkps/sn-7B-2). Forty-seven landraces and 83 cultivars were observed with hapoltype Hkps/sn-7B-1. None cultivars, but 33 landraces were found contains the advantage haplotype Hkps/sn-7B-2.

### SNP Clusters and Their Overlapped QTLs Were Integrated

The significantly associated SNPs detected in this research were compared with the previously reported QTLs, markers or genes based on the physical positions. Five clusters and one SNP related to SL found in this study were located within the known QTL regions. Two multienvironment loci associated with SN overlapped with the reported QTLs. Three related clusters and one related SNP to KPS were mapped on the reported QTL regions. Two clusters and five SNPs related to TKW overlapped with the previously reported QTLs. One cluster, one multi-environment loci and one SNP related to SNPP were located within reported QTL regions, respectively (**Supplementary Tables 9**–**13**). In addition, the TaGW2 homoeologues on 6D and 6A were physically covered one KPS and one TKW related SNP, respectively. Meanwhile, the Vrn-A1 gene was physically located to one SNPP related SNP (**Supplementary Tables 11**–**13**).

The SNP clusters mentioned above (located on the chromosome 2A, 2B, 2D, 6A, and 7B) and spike related QTLs reported previously were integrated on the physical maps (**Figures 2H,J**, **Supplementary Figure 9**, **Supplementary Table 14**). The results showed that SNP cluster qSL-2B (SNP cluster associated with spike length on 2B) was covered by the reported QTL qSL-2B.7 (**Figure 2H**). qKPS/SN-7B (SNP cluster associated with KPS/SN on 7B) was covered by the reported QTLs qKPS-7B.5 and qKPS-7B.6 (**Figure 2J**). qSL-2A and qKPS-2A detected in this research overlapped with the reported QTL qSL-2A.2 and qKPS-2A.8, respectively. qKPS-2B found in this study was covered by the reported QTL qKPS-2B.12. the SNP cluster qSL-2D overlapped with the reported QTL qSL-2D.9 (**Supplementary Figure 9**). Meanwhile, the SNP cluster qSL-6A was close to the reported QTL qSL-6A.1, and qTKW-2D reported herein was close to the reported QTLs qTKW-2D.23 and qTKW-2D.25 (**Supplementary Figure 9**, **Figure 2I**). In addition, the artificial selection regions (on 2A, 2B, 2D, 6A, and 7B) covered by the reported QTLs were also integrated on the maps. Notably, the selection region Selection-2A.7 was covered by the cluster qSL-2A reported herein and the QTL qSL-2A.2 reported previously. Similarly, the selection-2D.2 and selection-2D.3 overlapped with the reported QTLs (qSL-2D.9 and qTKW-2D.25). Meanwhile, they were physically closed to the SNP clusters, qSL-2D and qTKW-2D.

### Candidate Genes for Artificial Selection and GWAS

Genes in the selected window clusters or windows detected by two methods were considered as candidate genes for artificial selection during wheat improvement. To annotate these genes, the wheat, rice and Arabidopsis functional gene databases were used. The result revealed that some of these genes were involved in selection-related agronomic traits such as stress response, development (seed size, seed number, seed maturation, seed dormancy, and flowering time) (**Supplementary Figure 10A**, **Supplementary Table 15**).

One Mb flanking regions of the above-mentioned SNP clusters that detected by GWAS were defined as candidate regions based on LD decay. Genes located in the candidate regions were identified as candidate genes. These genes were annotated by using the same methods mentioned above. The results showed that these genes were mainly involved in the function such as metabolism, transcription, stress response, development and so on (**Supplementary Figure 10B**, **Supplementary Table 16**).

### KASP Assay

SNPs from these hapoltypes were used to develop KASP markers (**Supplementary Table 3**). A total of 276 wheat lines across south and north China were genotyped by these KASP markers (**Supplementary Figure 11**). The results demonstrated that genotypes from KASP test were identical to the chip assay. The frequency and significant difference of these haplotypes in germplasm were analyzed (**Figure 4**). Compared to Hsl-2B-1 (9.41 ± 1.83 cm) which contains 89 landraces and 139 cultivars, Hsl-2B-2 contains 45 landraces and shows significantly longer spike (11.034 ± 1.906 cm). The average kernels per spike (53.554 ± 11.77) and spikelet number (22.747 ± 2.431) of Hkps/sn-7B-2, the advantage allele, showed significantly higher than that of Hkps/sn-7B-1 (kernels per spike: 50.054 ± 10.12; spikelet number: 21.82 ± 2.218). Hkps/sn-7B-2 included 56 landraces and 16 cultivars, while 76 landraces and 124 cultivars were identified to Hkps/sn-7B-1. In addition, the spike length of south China lines was significantly longer than that of north China for Hsl-2B-1. The similar result was found in Hkps-7B-1, but the spikelet number of south China lines was significantly higher than that of north China for both haplotypes of Hsn-7B.

### DISCUSSION

Conventional breeding, mostly based on the phenotypic selection, has undergone several centuries. Despite the huge improvement of artificial selection, blindness, empirical, inefficiency and long time costs of conventional breeding blocked the wheat yield increasing. The wheat genome sequence, released on the IWGSC website, could promote the rapid improvement of cultivars by efficient using genetic resources and genomic breeding. Molecular breeding would lead the trend of technological development from SSR to SNP level (Liu et al., 2017). In consideration of the modularity of biological process, rational design modules resulting in predictable functions might become the key step for molecular breeding (Gavin et al., 2006; Silver et al., 2014). For example, 14 engineered CLV3 promoter alleles targeted editing by CRISPR/Cas9 causes a continuum of locule number variation in tomato (Somssich et al., 2016; Rodríguez-Leal et al., 2017).

FIGURE 4 | Frequency of haplotypes detected by KASP markers in wheat germplasm (A–C) Frequency of haplotypes related to SL, SN/KPS on chromosome 2B and 7B, respectively. Extermun value and mean value of traits are displayed by the box plot. Statistical significance was determined by LSD test: \*P < 0.05, \*\*P < 0.01. N denoted the number of genotypes belonging to each haplotype. SL, spike length; SN, spikelet number; KPS, kernels per spike; L, landrace; C, cultivar; S, south China samples; N, north China samples; T, total samples.

## Wheat Yield Improvement Is the Main Aim of Artificial Selection

During the past several centuries, crop has undergone the domestication and selection in order to increase the yield. As a result, lots of advantage genes/hapoltypes were picked up and inherited. In this research, artificial selection signal analysis was conducted in the wheat natural population which contained landraces and cultivars. As the result, a lot of window clusters were detected for the selective sweep candidates that tended to occur in clusters (Gao et al., 2017). Fortunately, lots of candidate genes involved in improvement-related agronomic traits such as stress response, seed size, seed number, seed maturation, seed dormancy and flowering time were obtained in the selective sweep regions. The results suggest that increasing wheat yield, as the most powerful evolutionary force, has created superior genotypes by phenotype selection and fixed in cultivars such as bigger seed size, longer spike, shorter plant height, and more kernels number per spike (Yan et al., 2018).

To testify the usefulness of the selection analysis in the worldwide collection, the candidate sweep regions were compared with the previously reported QTLs, markers and genes related with the investigated traits. In total, 81 windows were overlapped with the known improvement agricultural related traits in different wheat populations from all over the world (**Figure 1**, **Supplementary Table 8**). Fortunately, two window clusters on 2A and 6B overlapped with the reported yield related gene TaUBP24 (Liu K. et al., 2018) and QTL for thousand kernel weight and spike length (Wang et al., 2011; Mir et al., 2012), respectively. This would be the strong evidence for the general applicability of this method.

## SNP Clusters Related to Spike Traits and Overlapped QTLs Were Integrated

A lot of QTLs associated with spike related traits have been detected from genetic populations. These QTLs spread all over the 21 chromosomes (Liu K. et al., 2018). However, because of the genetic background and lack of genomic information, few of these QTLs have been used in wheat improvement. In this study, interested SNP clusters related to spike traits on chromosome 2A, 2B, 2D, 6A, and 7B detected by GWAS which were analyzed based on a broad genetic background. According to the genome sequence information, these SNP clusters were located on wheat genome. Meanwhile, at these regions, several QTLs associated with the same phenotype were found.

KPS is the key component of wheat yield and spikelet number is the main factor affected the kernels per spike (Liu K. et al., 2018). qKPS/SN-7B, the multi-trait locus located on 7B, significantly associated with both KPS and SN. Pleiotropy, a single gene or QTL associated with multi-trait, has been proved in many previous reports (Neumann et al., 2011). The haplotype Hkps/sn-7B-2 only contained landraces, which indicated the advantage haplotype has not been transit into cultivars and the necessary usage of the haplotype in breeding. The multitrait locus qKPS/SN-7B has been confirmed in previous studies and overlapped with qKPS-7B.1, qKPS-7B.5, and qKPS-7B.6 (Wang et al., 2011; Cui et al., 2014; Liu et al., 2014; Yu et al.,

2018). However, the QTL regions were too wide (270∼571 Mb) to predict candidate genes. SNP cluster qKPS/SN-7B reported herein covered a very narrow region (0.37 Mb). Based on the LD decay, the candidate region contained only 55 candidate genes (**Supplementary Tables 14**, **16**). The candidate gene, TraesCS7B01G456300, overlapped with the leading SNP encoded a BURP domain-containing protein. BURP genes broadly exist in plants, which have been proved to be the contributors for seed development, seed size, seed mass and seed number (Van Son et al., 2009; Xu et al., 2013). According to the results, TraesCS7B01G456300 which encoded the BURP domaincontaining protein might be the candidate gene to control KPS and SN.

By the same way, advantage haplotype Hkps-2A-3 for the SNP cluster associated with KPS on 2A contained the majority landraces and cultivars, which indicated that the advantage haplotype has been effectively transit into modern cultivars. Compared with the reported QTLs associated with KPS on chromosome 2A, qKPS-2A.8 overlapped with the cluster (Jia et al., 2013). The QTL region defined by the flanking markers for qKPS-2A.8 was relatively narrow (42.9 Mb), but it was also difficult to detect the candidate genes. By contrast, the SNP cluster qKPS-2A covered region (0.01 Mb) together with the LD region only including 33 candidate genes (**Supplementary Tables 14**, **16**).

Spike length as an indirect factor also affects KPS and plays important role in improving wheat yield (Guo et al., 2017). The advantage haplotypes of Hsl-2A and Hsl-2B contained only few cultivars, which mean the invalid transit into cultivars and the necessary usage of the two haplotypes in breeding. All modern cultivars contained this advantage haplotype Hsl-6A-4 which suggested that this haplotype has been effectively used in artificial selection. Spike length related QTLs on chromosome 2A, 2B, and 6A were integrated on a physical map. Interestingly, the SNP cluster qSL-2A was overlapped with reported QTL qSL-2A.2 (Liu et al., 2017) and annotated the selection signal Selection-2A.7. Among the candidate genes, there were two genes, TraesCS2A01G130200 and TraesCS2A01G130600, overlapped with the leading SNPs (**Supplementary Table 16**). The homologous genes of rice were OsHAD1 and OsGS1, respectively. Overexpresion of OsHAD1 in rice resulted in enhanced phosphatase activity and biomass (Pandey et al., 2017). Co-overexpression of OsGSA1 and OsGSA2 in rice could increase tiller number, panicle number, and grain filling, and result in yield improvement (James et al., 2018). According the results of rice, these two genes may be both important for wheat to improve the SL trait.

### Two Advantage Haplotypes for Wheat Yield Improvement

Detection of allelic variations is the first step to crop improvement and identification of advantage haplotype is crucial for breeding (Hou et al., 2014). Based on the GWAS results, several SNP clusters related to spike traits were detected and haplotypes were found in a 192 wheat collections. In order to use the advantage haplotypes, four pairs of KASP markers around Hsl-2B, Hsl-6A Hkps/sn-7B were successfully developed based on the SNPs. Larger wheat population including partial of the GWAS accessions and some northern China wheat germplasm was tested. The genotypes identified by KASP platform were consistent with the 90 K SNP chip which indicates KASP markers could be used for haplotypes detection (Tan et al., 2017).

According to the history of artificial selection, north China breeding paid more attention to spike number and south China is mainly on kernels per spike. This selection strategy is adapted to the humid environment as well as wheat disease which not usually occurred in north China (Kang et al., 2011). As a result, cultivars in south China show a larger spike (longer SL, higher SN, and KPS) than that of north China after the artificial selection. Hsl-2B-2, the advantage haplotype for increasing spike length on chromosome 2B, has not been detected in modern cultivars. Wheat lines contain this advantage haplotype show significant longer spike both in south and north China. After transition of this haplotype into the modern cultivars by using the KASP marker in breeding program, the spike length would be significantly increased in both south and north China. Hsn-7B-2, the advantage haplotype for increasing the spikelet number, has been partially transited into lines in north China but none into lines in south China. Meanwhile, both in south and north China, lines contain this advantage haplotype show significant higher spikelet number. Furthermore, for lines contain this advantage haplotype, the spikelet number in south China is significant more than that in north China. This result suggested transition of this haplotype in south cultivars could significantly increase their spikelet number, which similar to north China. Similar to Hsn-7B-2, the transition of Hkps-7B-2 into south cultivars, the advantage haplotype for increasing the kernels per spike, would significantly increase its KPS in our breeding program.

Hkps/sn-7B, multi-trait locus, detected in this research could both increase the spikelet number and kernels per spike. This result could be proved by the significantly correlation between kernels per spike and spikelet number (coevolution) during selection (Michel et al., 2018). The advantage haplotype of Hkps/sn-7B could be introduced into modern cultivar to increase kernels per spike and spikelet number once by KASP marker.

### CONCLUSION

In summary, seven SNP clusters involved in wheat spike traits were detected by selection signal analysis and GWAS. Based on the released wheat genome sequence, an integrated map which contains the SNP clusters and their overlapped/flanking QTLs was constructed. KASP markers to identify two advantage haplotypes were developed for increasing SL, KPS and SN in further breeding program.

### AUTHOR CONTRIBUTIONS

JL, BF, and ZX designed the research. JL, ZX, BF, XF, QZ, JC, FW, GJ, and LY conducted phenotype of the wheat population. JL analyzed the data and wrote the manuscript; Funding was acquired by TW. BF and TW had primary responsibility for final content. All authors contributed to manuscript revision, read and approved the submitted version.

### FUNDING

This work was supported by grants from The National Key Research and Development Program of China (Grant No. 2016YFD0102000) and the National Key Project of Transgenic Biologic Varieties Breeding of China (Grant No. 2016ZX08009003-004) and Youth Innovation Promotion Association, CAS.

### REFERENCES


### ACKNOWLEDGMENTS

We thank Wuyun Yang for providing some of the wheat landraces seeds. The KASP assay was conducted by China Golden Marker (Beijing) Biotech Co., Ltd. The physical position information of 90K SNP chip was provided by the Triticeae Multi-omics Center.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2018. 01584/full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Liu, Xu, Fan, Zhou, Cao, Wang, Ji, Yang, Feng and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Genome-Wide Transcription During Early Wheat Meiosis Is Independent of Synapsis, Ploidy Level, and the Ph1 Locus

Azahara Carmen Martín<sup>1</sup> \*, Philippa Borrill1,2, Janet Higgins<sup>3</sup> , Abdulkader Alabdullah<sup>1</sup> , Ricardo H. Ramírez-González<sup>1</sup> , David Swarbreck<sup>3</sup> , Cristobal Uauy<sup>1</sup> , Peter Shaw<sup>1</sup> and Graham Moore<sup>1</sup>

#### Edited by:

Pierre Sourdille, INRA Centre Auvergne Rhône Alpes, France

#### Reviewed by:

Eric Jenczewski, INRA Centre Versailles-Grignon, France Sateesh Kagale, National Research Council Canada (NRC-CNRC), Canada Heidi Serra, INRA Centre Auvergne Rhône Alpes, France

> \*Correspondence: Azahara Carmen Martín Azahara.martinramirez@jic.ac.uk

#### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 09 October 2018 Accepted: 19 November 2018 Published: 04 December 2018

#### Citation:

Martín AC, Borrill P, Higgins J, Alabdullah A, Ramírez-González RH, Swarbreck D, Uauy C, Shaw P and Moore G (2018) Genome-Wide Transcription During Early Wheat Meiosis Is Independent of Synapsis, Ploidy Level, and the Ph1 Locus. Front. Plant Sci. 9:1791. doi: 10.3389/fpls.2018.01791 <sup>1</sup> John Innes Centre, Norwich, United Kingdom, <sup>2</sup> School of Biosciences, University of Birmingham, Birmingham, United Kingdom, <sup>3</sup> Earlham Institute, Norwich, United Kingdom

Polyploidization is a fundamental process in plant evolution. One of the biggest challenges faced by a new polyploid is meiosis, particularly discriminating between multiple related chromosomes so that only homologous chromosomes synapse and recombine to ensure regular chromosome segregation and balanced gametes. Despite its large genome size, high DNA repetitive content and similarity between homoeologous chromosomes, hexaploid wheat completes meiosis in a shorter period than diploid species with a much smaller genome. Therefore, during wheat meiosis, mechanisms additional to the classical model based on DNA sequence homology, must facilitate more efficient homologous recognition. One such mechanism could involve exploitation of differences in chromosome structure between homologs and homoeologs at the onset of meiosis. In turn, these chromatin changes, can be expected to be linked to transcriptional gene activity. In this study, we present an extensive analysis of a large RNA-seq data derived from six different genotypes: wheat, wheat–rye hybrids and newly synthesized octoploid triticale, both in the presence and absence of the Ph1 locus. Plant material was collected at early prophase, at the transition leptotenezygotene, when the telomere bouquet is forming and synapsis between homologs is beginning. The six genotypes exhibit different levels of synapsis and chromatin structure at this stage; therefore, recombination and consequently segregation, are also different. Unexpectedly, our study reveals that neither synapsis, whole genome duplication nor the absence of the Ph1 locus are associated with major changes in gene expression levels during early meiotic prophase. Overall wheat transcription at this meiotic stage is therefore highly resilient to such alterations, even in the presence of major chromatin structural changes. Further studies in wheat and other polyploid species will be required to reveal whether these observations are specific to wheat meiosis.

Keywords: polyploidy, whole-genome duplication, wheat–rye hybrid, triticale, RNA-seq analysis, ZIP4, Ph1 gene, chromosomal rearrangements

Polyploidization, or whole genome duplication (WGD), has an important role in evolution and speciation, particularly in plants. It is now clear that all seed plants and angiosperms have experienced multiple rounds of WGD during their evolutionary history and are now considered to possess a paleopolyploid ancestry (Renny-Byfield and Wendel, 2014). Polyploidy is traditionally classified into two separate types, autopolyploidy, arising from intraspecies genome duplication, and allopolyploidy, arising from interspecific hybridization. Many of the world's most important crops, including wheat, rapeseed, sugarcane, and cotton, are relatively recent allopolyploids; and much of the current knowledge about WGD is due to research involving these crop species. Several studies have reported major changes in transcription in somatic tissues following polyploidization (Renny-Byfield and Wendel, 2014 and references therein; Li et al., 2014; Edger et al., 2017; He et al., 2017; Sun et al., 2017; Lloyd et al., 2018). However, there have been very few previous reports on the effects of polyploidization on transcription during meiosis, a critical stage in the establishment of a polyploid (Braynen et al., 2017).

Meiosis is the specialized cell division that generates haploid gametes for sexual reproduction. During meiosis, homologous (identical) chromosomes synapse along their length and recombine, leading to novel combinations of parental alleles, and ensuring proper chromosome segregation. Restriction of synapsis and crossover (CO) formation to homologous chromosomes (homologs) is therefore a prerequisite for regular meiosis. Subsequent recombination is also critical, not only to generate new combinations of genes, but also to ensure an equal distribution of genetic material and maintain fertility and genome stability across generations. One of the problems of polyploidization is that it is initially accompanied by irregular meiosis, due to the presence of more than two identical homologs in autopolyploids, or very similar chromosomes (homoeologs) in allopolyploids. Thus, one of the biggest challenges faced by a new polyploid, is how to manage the correct recognition, synapsis, recombination, and segregation of its multiple related chromosomes during meiosis, to produce balanced gametes.

Although studies on diploid model systems (reviewed in Mercier et al., 2016) have revealed much about the processes of recombination and synapsis, the way in which homolog recognition initiates the synapsis process during the telomere bouquet remains one of the most elusive questions still to be addressed. There are several different genetic and structural mechanisms of meiotic chromosome recognition reported in plants, mammals, and fungi, indicating a differing process of recognition within different organisms (revised in Grusz et al., 2017). In most eukaryotes, homologous recognition is initiated by the formation of double-strand breaks (DSB) catalyzed by the Spo11 protein. Subsequently, the DSB free ends invade the corresponding homolog regions, checking for sequence homology based on DNA sequence. However, it has also been observed, for example in hexaploid wheat, that the process of homolog recognition is also associated with major changes in chromosome chromatin structure (Prieto et al., 2004), suggesting that changes in chromatin structure may also be involved in the homolog recognition process. This may be more important in polyploid species such as hexaploid wheat, where the process of recognition must distinguish homologs from homoeologs. Hexaploid wheat T. aestivum, (2n = 6× = 42, AABBDD), also known as bread wheat, is a relatively recent allopolyploid, with three related ancestral genomes, which although different, possess a very similar gene order and content. Hexaploid wheat has a 16 Gb genome size, with high similarity between homoeologous genomes in the coding sequences (95–99%), and with a large proportion of repetitive DNA (>85%) (International Wheat Genome Sequencing Consortium [IWGSC], 2018). Despite this, and the problem of having to distinguish between related chromosomes, hexaploid wheat is able to complete meiosis in a shorter period than diploid species such as rye, barley or even Arabidopsis, which possess a much smaller genome (Bennet and Finch, 1971; Bennet et al., 1971; Armstrong et al., 2003). Therefore, wheat meiosis is likely to exploit other mechanisms, apart from the traditional model based on DNA sequence homology, to facilitate homologous recognition. One such mechanism is likely to involve exploiting meiotic chromosome organization, which in turn might be linked to the transcriptional activity of the genes on homologous and homoeologous chromosomes (Cook, 1997; Wilson et al., 2005; Xu and Cook, 2008). It would be very interesting to assess the overall level of transcription occurring at the meiotic stage when chromosomes are recognizing each other, and synapsis is beginning, to address whether the homology search influences or is influenced by transcription.

Despite the significant similarity between homoeologs, wheat behaves as a diploid during meiosis, with every chromosome recombining only with its true homolog. This phenotypic behavior has been predominantly attributed to Ph1 (Pairing homoeologous 1), a dominant locus on chromosome 5B (Riley and Chapman, 1958; Sears and Okamoto, 1958), which most likely arose during wheat polyploidization (Chapman and Riley, 1970). In the absence of this locus, CO between non-homologs can occur, and so it was believed that the Ph1 locus prevented synapsis between homoeologs. However, it has recently been demonstrated in wheat-wild relative hybrids lacking homologs, that although homoeologous chromosomes fail to synapse during the telomere bouquet (leptotene-zygotene transition), they do synapse to the same level after the telomere bouquet has dispersed, whether or not Ph1 is present (Martín et al., 2017). This confirms that the Ph1 locus itself does not prevent homoeologous synapsis after telomere bouquet dispersal in the wheat-wild relative hybrid. Similarly, in normal hexaploid wheat, only homologous synapsis can occur during the telomere bouquet stage. However, in the absence of Ph1, homologous synapsis is less efficient, with more overall synapsis occurring after the telomere bouquet has dispersed, when homoeologous synapsis can also take place. This non-specific synapsis between homoeologs leads to the low level of multivalents and univalents observed at metaphase I in wheat lacking Ph1. These observations indicate that, during the telomere bouquet, meiocytes from wheat and wheat–rye hybrids, with and without Ph1, exhibit major differences in level of synapsis, and chromatin structure.

Such meiocytes provide a good source of material to assess the relationship between homolog recognition and synapsis, and transcription.

The Ph1 locus was recently defined to a region on chromosome 5B containing a duplicated 3B chromosome segment carrying the major meiotic gene ZIP4 and a heterochromatin tandem repeat block, inserted within a cluster of CDK2-like genes (Griffiths et al., 2006; Al-Kaff et al., 2008; Martín et al., 2014, 2017). The duplicated ZIP4 gene (TaZIP4-B2) within this cluster is responsible for both promotion of homologous CO and restriction of homoeologous CO, and is involved in improved synapsis efficiency (Rey et al., 2017, 2018a). The CDK2-like gene cluster has an effect on premeiotic events, its absence giving rise to delayed premeiotic replication and associated effects on chromatin and histone H1 phosphorylation (Greer et al., 2012). The processes of centromere pairing and telomere dynamics during premeiosis are also affected, probably as a result of this delay (Martínez-Pérez et al., 1999; Richards et al., 2012). Thus, the presence or absence of the Ph1 locus affects the chromatin structure of chromosomes entering meiosis. This raises the question as to whether these premeiotic structural changes also affect overall transcription between homologs and homoeologs leading to altered recognition during early meiosis.

In this study we undertook a comprehensive study of transcription during early meiotic prophase, specifically at the leptotene-zygotene transition stage, when the telomere bouquet is formed in wheat and synapsis between homologs begins, to assess the effect on transcription of: changes in chromatin structure upon homologous recognition, level of synapsis, ploidy level, and presence of the Ph1 locus. To evaluate this, a comparative transcriptome analysis was performed on meiocytes derived from wheat, wheat–rye hybrids and doubled wheat–rye hybrids (newly synthesized triticale), both in the presence and absence of Ph1. These six genotypes provided a unique set of transcription data, which can also be exploited in further studies. Surprisingly the analysis revealed that neither the level of synapsis, the ploidy level, nor the Ph1 locus affected overall meiotic transcription during the leptotene-zygotene transition stage.

### MATERIALS AND METHODS

### Plant Material

The plant material used in this study and its production is described in **Figure 1**, and includes: hexaploid wheat Triticum aestivum cv. Chinese Spring (2n = 6× = 42; AABBDD), either containing or lacking the Ph1 locus (Sears, 1977); rye Secale cereale cv Petkus (2n = 2× = 14; RR); wheat–rye hybrids crosses between hexaploid wheat either containing or lacking the Ph1 locus, and rye; octoploid triticale × Triticosecale Wittmack (2n = 8× = 56), obtained after genome duplication of wheat–rye hybrids either containing or lacking the Ph1 locus.

The wheat–rye hybrids were generated by either crossing T. aestivum cv. Chinese Spring or T. aestivum cv. Chinese Spring ph1b mutant (Sears, 1977) as the female parent, with S. cereale cv. Petkus. Interspecific wheat–rye hybrids, either containing or lacking the Ph1 locus, are completely sterile. The octoploid

triticales were generated by treating wheat–rye hybrids, either containing or lacking the Ph1 locus, with colchicine to double the chromosome number. Colchicine was applied according to the capping technique (Bell, 1950). Briefly, when the hybrids were at the 4–5 tillering stage, two of the tillers were cut and covered (or

capped) with a small glass phial containing 0.5 ml of a solution

both lacking the Ph1 locus; only this time, wheat lacking Ph1 was used as the

female parent for the initial crosses.

of 0.3% colchicine. Once the solution was absorbed by the plant, the hybrids were left to grow. Successful chromosome doubling results in seed set. A few of the resulting duplicated seeds obtained were selfed twice in order to have sufficient seeds for all the studies. Only plants with a euploid chromosome number of 56 chromosomes were used for RNA-seq sample collection. One spike of every plant used for the RNA-seq analysis was selfed, so that cytological analysis could be performed on the progeny. One of the triticales lacking Ph1 used for the RNA-seq was sterile, so instead, another triticale lacking Ph1 was used to complete the cytological analysis.

Seeds were germinated on Petri dishes for 3 to 4 days. The seedlings were vernalized for 3 weeks at 7◦C and then transferred to a controlled environment room until meiosis, under the following growth conditions: 16 h light/8 h night photoperiod at 20◦C day and 15◦C night, with 70% humidity. After 6 to 7 weeks, plants were ready for meiosis studies. Tillers were harvested after 6 to 7 weeks, at early booting stage, when the flag leaf ligule is just visible (39 Zadoks scale). Anthers were collected at early prophase, at the transition leptotene-zygote, which is during the telomere bouquet stage in wheat. For each dissected floret, one of the three synchronized anthers was squashed in 45% acetic acid in water to identify the meiotic specific stage. The two remaining anthers were harvested into RNAlater (Ambion, Austin, TX, United States) for the RNA-seq experiments or fixed in 100% ethanol/acetic acid 3:1 (v/v) for cytological analysis of meiocytes.

### Sample Preparation and RNA Extraction

Anthers from wheat, wheat–rye hybrids and triticale, both in the presence and absence of the Ph1 locus, and rye were collected as described in the "Plant Material" section. Three biological replicates were prepared for each genotype, so a total of 21 samples were obtained. Anthers at the selected meiotic stage were harvested into RNAlater (Ambion, Austin, TX, United States). The anthers from three plants of each genotype were pooled in a 1.5-ml Eppendorf tube until 300 anthers were collected. As there were so few triticale seeds available, each triticale sample was derived from a single plant. Once sufficient anthers had been collected, the material was squashed using a pestle to release the meiocytes from the anthers, and the mix was transferred to a new Eppendorf trying to avoid as much of the anther debris as possible, to enrich the sample with meiocytes. The enriched meiocyte samples were centrifuged to eliminate the RNAlater and homogenized using QIAshredder spin columns (Qiagen, Hilden, Germany). RNA extraction was performed using a miRNeasy Micro Kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions. This protocol allows purification of a separate miRNA-enriched fraction (used for further analysis) and the total RNA fraction (>200 nt) used in this study.

### RNA-Seq Library Preparation and Sequencing

One microgram of RNA was purified to extract mRNA with a poly-A pull down using biotin beads. A total of 21 libraries were constructed using the NEXTflexTM Rapid Directional RNA-Seq Kit (Bioo Scientific Corporation, Austin, TX, United States) with the NEXTflexTM DNA Barcodes–48 (Bioo Scientific Corporation, Austin, TX, United States) diluted to 6 µM. The library preparation involved an initial QC of the RNA using Qubit DNA (Life Technologies, Carlsbad, CA, United States) and RNA (Life Technologies, Carlsbad, CA, United States) assays, as well as a quality check using the PerkinElmer GX with the RNA assay (PerkinElmer Life and Analytical Sciences, Inc., Waltham, MA, United States). The constructed stranded RNA libraries were normalized and equimolar pooled into one final pool of 5.5 nM using elution buffer (Qiagen, Hilden, Germany). The library pool was diluted to 2 nM with NaOH, and 5 µl was transferred into 995 µl HT1 (Illumina) to give a final concentration of 10 pM. Diluted library pool of 120 µl was then transferred into a 200 µl strip tube, spiked with 1% PhiX Control v3 and placed on ice before loading onto the Illumina cBot. The flow cell was clustered using HiSeq PE Cluster Kit v4, utilizing the Illumina PE\_HiSeq\_Cluster\_Kit\_V4\_cBot\_recipe\_V9.0 method on the Illumina cBot. Following the clustering procedure, the flow cell was loaded onto the Illumina HiSeq 2500 instrument following the manufacturer's instructions. The sequencing chemistry used was HiSeq SBS Kit v4 with HiSeq Control Software 2.2.58 and RTA 1.18.64. The library pool was run in a single lane for 125 cycles of each paired end read. Reads in bcl format were demultiplexed based on the 6 bp Illumina index by CASAVA 1.8, allowing for a one base-pair mismatch per library and converted to FASTQ format by bcl2fastq. RNA-seq data processing: the raw reads were processed using SortMeRNA v2.0 (Kopylova et al., 2012) to remove rRNA reads. The non-rRNA reads were then trimmed using Trim Galore v0.4.1<sup>1</sup> to remove adaptor sequences and low-quality reads (−q 20 – length 80 – stringency 3).

### Chromosome Coverage Plots

RNA-seq reads (trimmed non-rRNA reads as described above) for the 18 samples were aligned to the RefSeqv1.0 assembly (International Wheat Genome Sequencing Consortium [IWGSC], 2018), using HISAT v2.0.5 with strict mapping options (–no-discordant –no-mixed -k 1 –phred33 –rna-strandness RF) to reduce noise caused by reads mapping to the wrong regions. Output bam files (binary format for storing sequence alignment data) were sorted using samtools v1.5. The normalized average chromosome coverage (depth of reads aligning to each base along the genome) per 1 million base windows was obtained using bedtools v2.24.0. GenomeCoverageBed was run to generate a bedgraph file containing base coverage along each chromosome (scaled by reads per million). Each of the 21 chromosomes were divided into 1 million base windows and bedtools map was run to compute the average read depth over each 1 million base window. The average coverage was obtained for the three biological replicates of each sample. The ratio of coverage between samples was plotted as a heatmap using a custom R script<sup>2</sup> . The following formula was used to calculate the ratio (−1 to +1) of coverage; ratio = sample1–sample2/sample1+sample2.

<sup>1</sup>http://www.bioinformatics.babraham.ac.uk/projects/trim\_galore/ <sup>2</sup>http://www.R-project.org

### Differential Expression Analysis

fpls-09-01791 December 1, 2018 Time: 14:0 # 5

Genes were examined for differential expression between wheat containing and wheat lacking the Ph1 locus, by pseudoaligning the raw reads for these six samples against the Chinese Spring RefSeqv1.0+UTR transcriptome reference (International Wheat Genome Sequencing Consortium [IWGSC], 2018), using Kallisto v 0.42.3 (Bray et al., 2016) with default options. The index was built using a k-mer length of 31. Transcript abundance was obtained as estimated counts and transcripts per million (TPM) for each sample, and all samples were merged into matrices of gene-level expression using the script merge\_kallisto\_output\_per\_experiment\_with\_summary.rb from expVIP<sup>3</sup> . Only genes with a mean expression >0.5 TPM in at least one condition, i.e., one of the genotypes (wheat containing or wheat lacking Ph1) were retained for differential expression analysis (Ramírez-González et al., 2018); this included 65,683 genes. Differential expression analysis between conditions (three replicates for each condition) was carried out using DESeq2 (v1.18.1) in R (v3.4.4). The numbers of differentially expressed genes at various thresholds (padj < 0.05, padj < 0.01, and padj < 0.001, and fold change > 2) were obtained. Gene Ontology (GO) enrichment was carried out using the R package goseq (v1.26.0) in R (v3.4.4). The GO term annotation was obtained from the RefSeqv1.0 (International Wheat Genome Sequencing Consortium [IWGSC], 2018). Genes up-regulated >twofold with padj < 0.01, and genes down-regulated >twofold with padj < 0.01 were analyzed separately. Only significantly enriched GO terms (padj < 0.05) were retained. Human readable, PFAM and InterPro functional annotations for individual differentially expressed genes of interest were obtained from the functional annotation of the RefSeqv1.0 (International Wheat Genome Sequencing Consortium [IWGSC], 2018) from https:// opendata.earlham.ac.uk/wheat/under\_license/toronto/Ramirez-Gonzalez\_etal\_2018-06025-Transcriptome-Landscape/data/ TablesForExploration/FunctionalAnnotation.rds (Ramírez-González et al., 2018).

Differential expression amongst wheat–rye hybrid and triticale samples was analyzed by constructing an in silico wheat+rye transcriptome through combining the Chinese Spring RefSeqv1.0+UTR transcriptome reference (International Wheat Genome Sequencing Consortium [IWGSC], 2018) with the published rye transcriptome (Bauer et al., 2016). In total, this combined reference contained 326,851 transcripts, of which 299,067 were from wheat and 27,784 were from rye. The rye transcriptome only contained one isoform per gene whereas the wheat transcriptome contained multiple isoforms. The 12 samples with both wheat and rye genomes (wheat–rye hybrid and triticale in the presence and absence of Ph1, each with three biological replicates) were mapped to the in silico wheat–rye transcriptome using the same method as described above for wheat samples. Only genes with a mean expression >0.5 TPM in at least one condition (wheat–rye hybrid or triticale, containing or lacking Ph1) were retained for differential expression analysis. This included 83,202 genes in total, of which 50,292 were high confidence (HC) wheat genes, 22,138 were low confidence (LC) wheat genes and 10,772 were rye genes. Two comparisons were carried out to examine the effect of Ph1 (wheat–rye hybrids carrying vs. lacking Ph1, and triticale carrying vs. lacking Ph1), and one comparison performed to examine the effect of different synapsis levels and chromosome doubling (wheat–rye hybrids vs. triticale, both carrying Ph1). Differentially expressed genes for each comparison were identified using DESeq2 as described above, using three replicates per condition. The numbers of differentially expressed genes at various thresholds (padj < 0.05, padj < 0.01, and padj < 0.001, and fold change >2) were obtained for wheat and rye genes. GO term annotation was available for the wheat genes, therefore we focused only on differentially expressed wheat genes for GO enrichment analysis and excluded differentially expressed rye genes from this analysis. GO enrichment analysis was carried out as described for the wheat samples, again separately analyzing genes up-regulated >twofold with padj < 0.01, and genes down-regulated >twofold with padj < 0.01.

The rye genome sequence is still incomplete; therefore, the present study focused on wheat-specific transcription effects. However, the rye data (PRJEB25586) are deposited to facilitate future analyses by the community.

### Genomic in situ Hybridization of Mitotic and Meiotic Cells (GISH)

It was not possible to analyze the samples used for the RNAseq analysis, so their progeny were analyzed instead. One spike of every plant used for the transcription analysis was selfed, and three individuals from each progeny analyzed. One of the triticale lacking Ph1 used for the RNA-seq was sterile, so instead, another triticale lacking Ph1 which had 57 chromosomes, was used.

The preparation of mitotic metaphase spreads and subsequent genomic in situ hybridization (GISH) was carried out as described previously (Rey et al., 2018b). Meiotic metaphase I spread preparation and subsequent GISH were also carried out as described previously (Cabrera et al., 2002). S. cereale, Triticum urartu, and Aegilops tauschii were used as probes to label rye, wheat A- and wheat D-genomes, respectively. S. cereale genomic DNA was labeled with tetramethyl-rhodamine-5-dUTP (Sigma) by nick translation as described previously (Cabrera et al., 2002). T. urartu and Ae. tauschii genomic DNA were labeled with biotin-16-dUTP and digoxigenin-11-dUTP, using the Biotin-nick translation mix and the DIG-nick translation mix, respectively (Sigma, St. Louis, MO, United States) according to the manufacturer's instructions. Biotin-labeled probes were detected with Streptavidin-Cy5 (Thermo Fisher Scientific, Waltham, MA, United States). Digoxigenin-labeled probes were detected with anti-digoxigenin-fluorescein Fab fragments (Sigma).

Images were acquired using a Leica DM5500B microscope equipped with a Hamamatsu ORCA-FLASH4.0 camera and controlled by Leica LAS X software v2.0. Images were processed using Fiji (an implementation of ImageJ, a public domain program by W. Rasband available from

<sup>3</sup>https://github.com/homonecloco/expvip-web/blob/

<sup>20180912</sup>ScriptToMergeKallistoOutput/bin/merge\_kallisto\_output\_per\_ experiment\_with\_summary.rb

https://imagej.nih.gov/ij/) and Adobe Photoshop CS4 (Adobe Systems Incorporated, United States) version 11.0 × 64.

### Availability of Supporting Data

fpls-09-01791 December 1, 2018 Time: 14:0 # 6

Raw Illumina reads have been deposited into EMBL-EBI ENA (European Nucleotide Archive<sup>4</sup> ) under project number PRJEB25586. Analyzed data for the wheat samples (TPM and counts) were integrated in the expVIP platform www.wheatexpression.com (Borrill et al., 2016). Analyzed data for wheat, hybrids and triticale samples are available through https:// opendata.earlham.ac.uk/.

### RESULTS

### Transcriptome Sequencing

To assess whether synapsis, ploidy level and changes in chromatin structure associated with Ph1 have any effect on global transcription during early meiotic prophase I, the transcriptome of wheat, wheat–rye hybrid and the corresponding triticale were analyzed by RNA-seq in the presence and absence of the Ph1 locus. In wheat florets, the three anthers and the meiocytes within them are highly synchronized in development. We staged one of the three anthers by microscopy, to ensure that the meiocytes were at the transition leptotene-zygotene stage, leaving the other two anthers for RNA extraction. Three biological replicates were produced for each transcriptome, with a total of 18 libraries generated.

Using Illumina sequencing, a total of 1,388 million reads were generated for the 18 libraries. For subsequent analysis, the RNAseq data were processed using two different methods. Firstly, samples were trimmed and aligned to the wheat RefSeqv1.0 assembly using HISAT to generate the chromosome coverage plots. Strict mapping options were used to reduce the noise caused by reads mapping to the wrong regions, particularly mismapping from the rye onto the wheat genome. The percentage of reads aligned to the wheat genome was on average 88.46% for wheat samples, 74.61% for wheat–rye hybrids samples and 75.48% for triticale samples (**Supplementary Table S1**). On average 13.4% fewer reads mapped in wheat–rye and triticale samples than in wheat samples, indicating that the stringent mapping conditions were effective in reducing mis-mapping. However, it is possible that a low level of residual mis-mapping occurred, whereby reads from rye genes are mapped onto the wheat genome.

Secondly, DESesq2 was used to examine genes differentially expressed between genotypes. The six wheat samples were pseudoaligned to the Chinese Spring RefSeqv1.0+UTR transcriptome reference using kallisto. The percentage of reads pseudoaligned was similar across samples, with a mean value of 72% as detailed in **Supplementary Table S2**. The three biological replicates showed good correlation and clustered together in the principle component analysis, although samples containing the Ph1 locus grouped together more tightly than samples lacking Ph1 (**Supplementary Figure S1A**). The wheat RefSeq Annotation v1.0 includes 110,790 HC genes and 158,793 LC genes. LC genes represent partially supported gene models, gene fragments and orphan genes (International Wheat Genome Sequencing Consortium [IWGSC], 2018). We decided to retain these LC genes in our differential expression analysis because one of the reasons for them to be considered as LC genes is a lack of RNA-seq data evidence. As already mentioned, no data on wheat meiosis were previously published, and since it is therefore possible that some of these LC genes are specifically involved in meiosis, we decided to include them in the analysis.

The 12 samples from wheat–rye hybrids and triticale, were mapped to a wheat+rye transcriptome created in silico for this study (described in section "Materials and Methods"). Although our aim in these samples was principally to study the expression of wheat genes, the use of a hybrid transcriptome reduced the possibility of mis-mapping between rye and wheat reads, which would have led to inaccurate quantification of expression. Kallisto was used for mapping because in wheat, it accurately distinguishes reads from homoeologs carrying genes with a sequence identity between 95–97% (Ramírez-González et al., 2018). Therefore, kallisto is capable of distinguishing wheat and rye reads, which are more divergent [91% sequence identity within genes (Khalil et al., 2015)]. The percentage of reads pseudoaligned to this transcriptome was similar across samples, with a mean value of 71% for wheat–rye hybrids and triticale (**Supplementary Table S2**). The three biological replicates from each genotype showed good correlation and clustered together in the principle component analysis (**Supplementary Figure S1B**). In total, 269,583 genes from wheat (110,790 HC and 158,793 LC), plus 27,784 genes from rye were annotated, giving a total of 297,367 genes for the hybrid transcriptome created.

### Overall Transcription Is Independent of Synapsis, Ploidy Level or the Presence of Ph1

Chromosome coverage plots were generated to reveal a global picture of the difference in transcription between the different genotypes analyzed. The cleaned RNA-seq reads were aligned to the RefSeqv1.0 assembly and the ratio of the coverage along all the chromosomes plotted as a heatmap (**Figure 2**).

At the leptotene-zygotene transition, when the telomere bouquet is tightly formed, only homologous chromosomes can synapse. In wheat–rye hybrids there are no homologs present, and therefore, no synapsis takes place at this stage; whereas in triticale, a significant level of synapsis occurs. However, no overall change in wheat transcription was observed when these two genotypes were compared (**Figure 2A** and **Supplementary Figure S2**). Even more striking was that the duplication of the genome and change in ploidy level had little effect on overall wheat transcription. The wheat–rye hybrid is a poly-haploid (n = 4× = 28, ABDR) and triticale an octoploid (2n = 8× = 56, AABBDDRR), and although vegetative development is normal in both genotypes, the haploid hybrids are completely sterile. The reason for sterility in the wheat–rye hybrid is that meiosis is highly compromised, with only one CO at metaphase I and subsequent random segregation of chromosomes. Despite this,

<sup>4</sup>https://www.ebi.ac.uk/ena

(A) Heatmap comparing wheat–rye hybrids and triticale, both containing the Ph1 locus (Ph1+). No global change in transcription was observed between these two genotypes. (B) Heatmap comparing wheat in the presence (Ph1+) and absence (Ph1–) of Ph1. (C) Heatmap comparing wheat–rye in the presence and absence of Ph1. (D) Heatmap comparing triticale in the presence and absence of Ph1. Several deletions (visualized in dark blue) and other chromosomes reorganizations were detected in all genotypes in the absence of Ph1 (B–D).

wheat transcription during early meiotic prophase seems not to be affected.

Next, a comparison of wheat, wheat–rye hybrid and triticale, with their corresponding genotypes lacking the Ph1 locus was made (**Figures 2B–D** and **Supplementary Figure S2**). This time, the heatmaps revealed a very different situation, with very clear differences in transcription in all comparisons. A deletion on chromosome 5B (visualized in dark blue in **Figures 2B–D**) was observed in all samples lacking Ph1, corresponding to the deletion of the ph1b mutant (Sears, 1977). However, several other deletions (visualized in dark blue) were also observed in all samples. Due to the nature of this locus, which affects synapsis and CO formation between homoeologs, the presence of chromosome rearrangements has been previously described in wheat lacking Ph1 (Sánchez-Morán et al., 2001). However, the number of rearrangements revealed was higher than expected. An interstitial deletion on 3BL and two deletions on 3AL, one interstitial and another terminal, were common to all samples lacking Ph1. Apart from these, wheat lacking Ph1 had two more deletions: a terminal one on 2DL and a distal one on

3DL; while triticale lacking Ph1 had three more deletions: a terminal one on 1AL, a large terminal one on 3AL and a large distal one on 5DL. As observed in **Figure 2C**, the heatmap belonging to the wheat–rye hybrids showed no difference in overall transcription whether Ph1 was present or absent, apart from the common deletions mentioned. However, heatmaps corresponding to wheat and triticale were more difficult to interpret, with several chromosome regions showing clear differential expression without a completely clear-cut presence or absence of Ph1, as in the case of the deletions. The triticale heatmap (**Figure 2D**) for example, revealed three chromosome regions showing increased transcription in triticale lacking Ph1 (visualized in orange in the heatmap): a terminal region in 1DL and 3DL, and a large distal region on 5BL. Interestingly, every one of these chromosome regions corresponded to a chromosome deletion on a homoeologous chromosome. For example, the terminal deletion on 1AL, corresponded to the terminal increased transcription on 1DL. Recombination could occur between 1A and 1D in the absence of Ph1, resulting in two 1D chromosomes plus two 1A chromosomes being detected in this material, in which the distal part of 1AL had a chromosome segment from 1DL. This, therefore, resulted in four copies of the D-genome chromosome segment, and hence increased transcription. These observations are consistent with wheat nulli-tetrasomic line analysis (Borrill et al., 2016), where the presence of four copies of a homoeolog leads to a doubling of transcription. The observed increases in transcription associated with the other deletions in wheat and triticale lacking Ph1 could also be explained in a similar manner. There were some regions where the interpretation was more complex. To investigate this further, we created heatmaps of wheat vs. wheat lacking Ph1, and triticale vs. triticale lacking Ph1, for every individual ph1 mutant sample (**Supplementary Figures S3**, **S4**). Results revealed that every individual sample lacking Ph1 was different, apart from the rearrangements common to all samples lacking Ph1 described above. One triticale sample was extremely rearranged (**Supplementary Figure S4**), so we decided to explore this further and perform GISH experiments on this material, which will be described in the following sections.

In summary, at this stage of meiosis, overall transcription was not affected by the absence of the Ph1 locus. Therefore, chromatin changes associated with the Ph1 locus did not affect overall transcription. All significant transcriptional changes observed between genotypes with and without Ph1 were associated with the presence/absence of chromosome regions likely to be the result of homoeologous recombination. We therefore conclude that neither synapsis, level of ploidy nor the presence of Ph1 have a significant overall effect on wheat meiotic transcription.

### Analysis of Differentially Expressed Genes (DEG)

Chromosome coverage plots showed no global changes in transcription. However, we also wanted to identify the number of genes differentially expressed, and check whether they were related to meiotic processes. A Kallisto-DESeq2 pipeline was used to examine the DEG between genotypes. Only genes expressed >0.5 TPM in at least one of the genotypes were selected for differential expression analysis, the rest being filtered out as non-expressed genes. The number of DEG among samples was calculated using different thresholds as described in Materials and Methods and **Supplementary Table S3**, with further analysis focussed on the comparisons at padj < 0.01 and fold change >2.

### DEGs Between Wheat–Rye Hybrids and Octoploid Triticale (Both Containing the Ph1 Locus)

Of 297,367 genes present in the wheat+rye hybrid transcriptome, 83,202 genes (27.98%) were expressed in our samples, of which 72,430 were from wheat and 10,772 from rye (**Supplementary Table S4**). As the rye genome sequence is incomplete, and most genes have no functional annotation, the analysis only focused on wheat genes. The 72,430 wheat genes include both high-confidence (HC) and low-confidence (LC) genes (**Table 1**). Although both HC and LC genes were included in the analysis, results are presented for HC genes separately in **Supplementary Table S4.** Interestingly, a high percentage of LC genes (22,138 genes) were detected as being expressed during early meiosis, with 2,711 LC genes expressed at a relatively high level (>10 TPM). This provides evidence that these LC genes could be HC genes which were missed during the original annotation process, perhaps due to the lack of RNA-seq data from meiosis samples. Among the 72,430 wheat expressed genes, only 344 genes were differentially expressed between wheat–rye hybrids and triticale (**Table 1**). This means that DEGs represent only 0.47% of all genes, a strikingly low number considering that the whole genome has been duplicated. These results also indicate, consistent with the chromosome coverage plots, that overall gene expression is independent of synapsis and the absence of homologous chromosomes.

To check whether DEGs genes were associated with common processes, Gene Ontology (GO) term enrichment was carried out on genes differentially expressed between samples (**Supplementary Table S5**). Genes down-regulated upon chromosome doubling (up-regulated in the hybrids vs. triticale) were enriched for few GO terms, which were mostly related to metabolic processes (general functions) not related to meiosis; moreover, p-values were only just significant (0.02–0.05). In the case of genes up-regulated upon chromosome doubling (downregulated in the hybrids vs. triticale), two thirds of the GO terms were related to stress and response to external stimuli, the rest being related to cell communication and catabolic processes. Although the present study focuses on changes in overall gene expression, we also extracted the functional annotation for all DEG, available at **Supplementary Table S6**.

### DEGs in the Absence of the Ph1 Locus

Next, we identified DEGs between wheat, wheat–rye hybrid and triticale, and their corresponding samples in the absence of the Ph1 locus. Of 269,583 genes annotated (RefSeqv1.0 assembly), 65,583 were expressed in our wheat samples, from which 474 genes (0.72%) were differentially expressed when Ph1 was deleted (**Table 1**). In the case of wheat–rye hybrids, 573 wheat genes (0.79%) were differentially expressed in the absence of Ph1; and 2672 (3.69%) genes were differentially expressed for triticale lacking Ph1 (**Table 1**). The number of DEGs was


TABLE 1 | Annotated genes, differentially expressed between samples in this study.

Ph1+, containing the Ph1 locus, Ph1−, lacking the Ph1 locus. Values in bold indicate the percentage of DEGs.

higher in triticale, in agreement with the chromosome coverage plot results. As described previously, all genotypes lacking Ph1 exhibit deletions and other chromosome rearrangements, and the DEGs detected could therefore be a consequence of these reorganizations, rather than due to an absence of Ph1 alone. To exclude these DEGs being a consequence of chromosomal rearrangement, we identified DEGs shared by all comparisons. We found that all three comparisons had 358 DEGs in common (**Figure 3**), of which 186 genes were located in the ph1b deletion on 5B. A further 106 and 33 genes were located in deletions common to all genotypes lacking Ph1 on 3A and 3B, respectively. Therefore, in total there were only 33 DEGs which could not be accounted for, based on their location within a common deleted region, meaning that overall gene expression was not significantly altered during early prophase by the absence of the Ph1 locus. We did not observe any trend directly related to meiosis in the functional annotation of these 33 genes (**Supplementary Table S7**). One gene, TraesCS2A01G561600, annotated as a DNA/RNA helicase protein, could be potentially involved in meiosis, since these enzymes play essential roles in DNA replication, DNA repair, and DNA recombination, which occur both in somatic and meiotic cells. However, the syntenic ortholog in Arabidopsis, the chromatin remodeling 24 gene (CHR24/ AT5G63950) is a

member of the SWI2/SNF2 family known to be involved in DNA repair and recombination in somatic tissue, while no function during meiosis has been reported (Shaked et al., 2006).

### Genes Responsible for the Ph1 Locus Phenotype on Recombination

In 1977 (Sears, 1977), the ph1b deletion used in this study (and most studies involving this locus) was obtained and estimated to be of 70 Mb in size. Using our gene expression data and the RefSeqv1.0 assembly, the ph1b deletion is now defined to a 59.3 Mb region containing 1187 genes, from which 299 genes are expressed in our RNA-seq data. The locus was further defined to a smaller region (Griffiths et al., 2006; Al-Kaff et al., 2008), now defined to 0.5 Mb in size and containing 25 genes (7 HC + 18 LC genes), from which only two are expressed in our RNA-seq data. One of these two genes is a DUF2431 domain protein (TraesCS5B01G254900) of unknown function. The other gene is the duplicated ZIP4 gene (TaZIP4-B2), which has been recently identified as the gene responsible for both promoting homologous CO and restricting homoeologous CO (Rey et al., 2017, 2018a). Another gene within the ph1b deletion, termed by the authors as C-Ph1, was also recently proposed to contribute to the Ph1 effect on recombination, specifically during metaphase I (Bhullar et al., 2014); however, this gene does not show any expression in our RNA-seq data. The authors reported 3 copies of this gene, one on 5A (truncated), one on 5B (with a splice variant named 5Balt) and one on 5D, claiming that only the 5B copy was metaphase I-specific and therefore, responsible for the phenotype characteristic of Ph1. However, blasting these gene sequences against the RefSeqv1.0 assembly showed that 5Balt was a fourth gene copy located on 5A chromosome just upstream of the original 5A copy (5A-1: TraesCS5A01G381600LC and 5A-2: TraesCS5A01G381700LC). The RNA-seq data obtained in the present study, as well as 849 wheat RNA-seq samples now publicly available (Ramírez-González et al., 2018) and 8 wheat meiotic libraries available at https://urgi.versailles.inra. fr/files/RNASeqWheat/Meiosis/, can now be used to study the expression profile of all the different copies of the ZIP4 and C-Ph1 genes found across a diverse range of tissues and developmental stages (**Figure 4**). The TaZIP4 copy on 5B (TaZIP4-B2) is the dominant ZIP4 copy and it is expressed in all tissue types, including all meiotic stages. C-Ph1 has an almost tissue-specific expression pattern, limited to stamen tissue during the heading stage (post-meiosis stage), and with the 5D copy being expressed dominantly over all other gene copies (expression level of the 5D copy being >1700 TPM, and of the 5B copy being <31

TPM). None of the C-Ph1 copies is expressed during any meiosis stage, except for the 5D copy that is expressed at a very low level (<4 TPM) in comparison with its expression in anthers at heading (>1700 TPM). In summary, we can confirm that C-Ph1 on 5B is not expressed during meiosis, and cannot therefore be responsible for any Ph1 effect during metaphase I.

### Cytological Characterization of the Newly Synthesized Triticale in the Presence and Absence of Ph1

Cytological analysis of wheat and wheat–rye hybrids, both in the presence and absence of the Ph1 locus, has been well documented in previous studies (Orellana, 1985; Naranjo et al., 1988; Wang and Holm, 1988; Benavente et al., 1998; Mikhailova et al., 1998; Sánchez-Morán et al., 1999, 2001); however, this is the first time to our knowledge, that triticale lines lacking Ph1 have been generated. As the chromosome coverage plots suggest the presence of several chromosome rearrangements, we performed mitotic and meiotic analysis to explore the origin and extent of these reorganizations.

### Chromosome Configuration on Mitotic Metaphase Cells

Root tip mitotic metaphase spreads were analyzed by GISH to determine the extent of homoeologous recombination or any

other rearrangements in the newly formed triticale lines, both containing and lacking Ph1.

Most of the triticale plants containing Ph1 were euploid (six of nine plants), with a chromosome number of 56, and possessing 14 chromosomes from the A, B, and D genome, plus 14 chromosomes from rye (**Supplementary Table S8**). However, three of the triticale plants were aneuploid (**Figure 5A**), all with rye chromosomes missing. Interestingly, although some aneuploidy was observed, none of the triticale plants presented any inter-genomic chromosome exchange, suggesting that all recombination took place between homologous chromosomes. The only chromosome exchange observed in these triticales was the ancient translocation present in bread wheat T4A·7B (**Figure 5A**).

In contrast to triticale containing Ph1, there were numerous chromosome rearrangements in the progeny of triticale lacking Ph1, including aneuploidy, deletions and intergenomic exchanges resulting probably from recombination events. All individuals analyzed were aneuploids, with chromosome numbers ranging from 51 to 59 plus one chromosome arm (**Figures 5B–D** and **Supplementary Table S8**). These triticale lines had only undergone three rounds of meiosis after synthesis, however, some lines exhibited reorganizations corresponding to 16 possible recombination events between homoeologous chromosomes (**Supplementary Table S8**). This wide range of chromosome

containing Ph1 with 14 A-chromosomes, 14 B-chromosomes, 14 D-chromosomes and 13+arm rye (R) chromosomes (cn = 55+arm). (B) Triticale lacking Ph1 with 12 A, 14+arm B, 15 D, 10+arm R and a centromeric translocation between a rye and a B-chromosome (TR·B) (cn = 53 + 2arms). (C) Triticale lacking Ph1 with 13 A, 15 B, 12 D, and 14 R (cn = 54). A proximal recombination between an A- and a B-chromosome, and an A-chromosome showing the result of three recombination events are highlighted. (D) Triticale lacking Ph1 with 12 A, 13 B, 15 D, and 13 R (cn = 53). The result of recombination between a rye and a B-genome chromosome is highlighted. Reorganizations are indicated by white arrows. The ancient translocation T4A·7B is indicated by green arrows.

rearrangements corresponds to the high levels of variability in the individual replicates of RNA-seq data shown in coverage plots (**Supplementary Figure S4**). We even detected a chromosome exchange between rye and a B-genome chromosome (**Figure 5D**), which is normally a very rare event. Apart from homoeologous exchanges and aneuploidy, there were also other structural rearrangements, such as several individual chromosome arms and a centromeric translocation between a B-genome and a rye chromosome (**Figure 5C**). Recombination in cereals is normally restricted to the distal ends of chromosomes, with 90% of wheat recombination occurring in only 40% of the physical chromosome (Saintenac et al., 2009); interestingly, we also observed very proximal homoeologous exchanges (probably resulting from recombination events) (**Figure 5C**), which is another example of the high level of reorganization present in these lines.

### Meiotic Metaphase I Configuration

Octoploid triticale have been reported to show meiotic instability and frequent aneuploidy in the presence of Ph1, resulting in reduced fertility (Scoles and Kaltsikes, 1974; Muntzing, 1979; Gustafson, 1982; Fominaya and Orellana, 1988; Lukaszewski and Gustafson, 2011). One third of plants in this study showed aneuploidy, in all cases involving rye chromosomes. Fertility rate was also reduced, even in euploid plants. All analyzed triticale lines containing Ph1 showed a fairly normal meiosis with mostly bivalents being formed at metaphase I (**Figure 6A**); however, univalents were also frequently present (**Figure 6B**), as well as a low level of multivalents. GISH was performed on meiotic metaphase I cells to determine the origin of the univalents and to ascertain whether the bivalents were always between homologs. Most of the univalents observed were rye in origin, although some wheat origin univalents were also observed (**Figure 6C**). As for the bivalent formation, all were between chromosomes from the same genome (**Figure 6C**), suggesting that although there was some level of CO failure, no recombination between homoeologs was taking place. This meiotic analysis supported data observed in the mitotic analysis, and was consistent with the presence of Ph1. Although all lines were fertile, the seed set was not complete in all flowers, and varied among different lines. This suggests that the abnormalities sometimes observed at meiosis, produced problems in chromosome segregation and also, probably aneuploidy and pollen abortion.

In the case of octoploid triticale lacking the Ph1 locus, meiosis could not be analyzed in some plants because anthers had not developed properly. When meiotic metaphase I cells could be analyzed, substantial irregularities were observed, including univalents, multivalents and chromosome fragmentation (**Figure 6D**). GISH analysis showed that although most CO formation was between chromosomes from the same genome, chromosomes from all genomes were also involved in non-homologous association, particularly those derived from Aand D-genomes. GISH analysis also revealed that most of the chromosome fragmentation at metaphase I were derived from rye chromosomes (**Figure 6E**). These specific plants were all sterile apart from one plant, which produced three seeds. The fertility of

all triticale lacking the Ph1 locus used in this work was very low and decreased exponentially with each generation.

Morphology of all triticale plants containing Ph1 was perfectly normal (**Supplementary Figure S5**). However, every triticale plant lacking Ph1 was morphologically different, with some exhibiting very abnormal phenotypes, likely to be the result of extensive chromosomal rearrangements (**Supplementary Figure S5**).

### DISCUSSION

A high-quality annotated reference genome sequence of bread wheat (RefSeqv1.0) has recently been released by the International Wheat Genome Sequence Consortium (International Wheat Genome Sequencing Consortium [IWGSC], 2018), giving access to 110,790 high-confidence (HC) and 158,793 low-confidence (LC) genes. Together with this release, an extensive gene expression dataset of hexaploid wheat has been analyzed to produce a comprehensive, genomewide analysis of homoeolog expression patterns in hexaploid wheat (Ramírez-González et al., 2018). In total, 850 available RNA-seq data have been used across a diverse range of tissues, developmental stages, cultivars and environmental conditions. However, no specific data were available from meiosis, a key process ensuring proper chromosome segregation (and thus, genome stability and fertility) and leading to novel combinations of parental alleles, forming the basis of evolution and adaptation. This lack of meiotic RNA-seq data is a general issue in many species, not only in wheat, due to the challenge of collecting plant material at specific meiotic stages. Some RNA-seq approaches have been performed mainly in Arabidopsis, rice, maize, sunflower, and brassica (Chen et al., 2010; Dukowic-Schulze and Chen, 2014; Dukowic-Schulze et al., 2014; Flórez-Zapata et al., 2014; Zhang et al., 2015; Braynen et al., 2017), but to our knowledge, no RNA-seq analysis has previously been reported on wheat meiosis.

In this study, we took advantage of the recently released RefSeqv1.0 wheat assembly and our experience working on wheat meiosis, to perform an RNA-seq analysis from six different genotypes: wheat, wheat–rye hybrids and newly synthesized triticale, both in the presence and absence of Ph1. All plant material was collected during early prophase, at the leptotene-zygotene transition, coinciding with telomere bouquet formation and synapsis between homologs. We addressed three questions in the study: whether overall wheat transcription was affected by the level of synapsis (and chromatin structure changes at the time of homolog recognition); whether wheat transcription was reshaped upon genome duplication; and whether wheat transcription was altered in the absence of the Ph1 locus. Surprisingly, the answer to all three questions was negative. Wheat transcription was not affected in any of the three situations, revealing an unexpected level of transcription stability at this very important developmental

stage. These results contrast with observations in somatic tissue of resynthesized hexaploid wheat, where 16% of genes were estimated to display non-additive expression (Pumphrey et al., 2009).

### High Stability of Global Gene Expression During Early Meiotic Prophase

During meiosis, homologous (and sometimes non-homologous) chromosomes pair and then synapse through the polymerization of a protein structure known as the synaptonemal complex, which provides the structural framework for recombination to take place. Throughout the whole of meiosis, but particularly during the synaptic process, there are multiple changes in chromatin structure and organization (and even positioning in the nucleus), taking place within a relatively short period of time and needing to be highly regulated. In wheat, synapsis is initiated during the telomere bouquet stage at early prophase, during the transition leptotene-zygotene (Martín et al., 2017). Moreover, it has been observed, that the process of recognition between homologs is associated with major changes in the chromatin structure of chromosomes (Prieto et al., 2004), suggesting that these changes in chromatin structure may be required for the homolog recognition process and initiation of synapsis. Indeed, it is now well understood that chromatin conformation is a critical factor in enabling many regulatory elements to perform their biological activity, and that chromatin structure profoundly influences gene expression (Dixon et al., 2015; Dogan and Liu, 2018 ˘ ). Therefore, it was reasonable to suppose that differences in synapsis, and therefore, in chromatin structure would translate into differences in transcription.

Surprisingly, however, we did not find the expected differences in overall wheat transcription and gene expression when comparing samples with different levels of synapsis, indicating that the structural changes associated with this process were not directly coupled to transcription. Probably the clearest example of this was the comparison of wheat–rye hybrids and octoploid triticale. Wheat–rye hybrids possess a haploid set of wheat and rye chromosomes (there are no homologs present) and no synapsis is observed during the telomere bouquet stage (Martín et al., 2017). In contrast, octoploid triticale, which is obtained after chromosome doubling of wheat–rye hybrids, and which therefore possesses a whole set of wheat and rye chromosomes, exhibits extensive synapsis during the same stage. However, only 0.47% of the expressed genes were differentially expressed between these two samples (0.38% considering only HC genes), with most of these genes being involved in stress response and other metabolic processes (general functions) not related to meiosis.

Even more striking is the fact that in our study, global gene expression was not affected by WGD. Several previous studies have reported genetic and epigenomic processes being disrupted after hybridization and polyploidization, with subsequent changes in gene expression (Qi et al., 2012; Renny-Byfield and Wendel, 2014; Khalil et al., 2015; Edger et al., 2017 and references therein; Sun et al., 2017). These studies were not performed on meiotic tissue, however, given the known failure of meiosis in wheat–rye hybrids and the relatively normal meiotic progression in the duplicated triticale, it would have been reasonable to expect an effect on the expression pattern. However, as mentioned above, only a small fraction of genes were differentially expressed, again with none involved in meiosis. Unfortunately, the rye genome sequence is incomplete, and a similar analysis for rye genes could not be performed. In the future, when the complete rye genome sequence is available, it will be interesting to assess whether global expression of rye genes is also unchanged.

We propose that a possible explanation for the striking robustness in gene expression during early meiotic prophase is that the transcription of genes required for the meiotic program has already occurred prior to the leptotene-zygotene transition. Meiosis is a very complex process which takes place in a relatively short period of time. In wheat, the whole meiosis process lasts only 24 h at 20◦C, with the whole process of synapsis being less than 6 h long (Bennet et al., 1973). Therefore, it would be reasonable to suppose that most of the transcription needed for such a critical process has already occurred prior to synapsis initiation. From mouse studies, it has been recently reported that a considerable number of genes involved in early, as well as later meiotic processes, are already active at early meiotic prophase (da Cruz et al., 2016). Moreover, a major change in gene expression patterns occurs during the middle of meiotic prophase (pachytene), when most genes related to spermiogenesis and sperm function appear already active (da Cruz et al., 2016). It is possible that this change in gene expression pattern also happens in wheat, with a transcriptional switch from pre-meiosis to meiosis taking place very early, before meiotic prophase. Only when more RNA-seq datasets are available, can the dynamics of gene expression during wheat meiosis be fully understood.

### Changes in Wheat Expression Lacking the Ph1 Locus Are the Result of Multiple Chromosome Reorganizations

The Ph1 locus in wheat is by far, the best characterized locus involved in the diploid-like behavior of polyploids during meiosis. Ph1 has a dual effect during meiosis: firstly, improving the efficiency of homologous synapsis and secondly, preventing CO formation between homoeologs while increasing CO between homologs (Martín et al., 2017). Recently, the duplicated ZIP4 gene inside the Ph1 locus on 5B (TaZIP4-B2) has been identified as responsible for the effect of this locus on recombination and suggested to be also involved in the improved synapsis efficiency (Rey et al., 2017, 2018a). ZIP4 is a meiotic gene shown to have a major effect on homologous COs in both Arabidopsis and rice, and a mild effect on synapsis (Chelysheva et al., 2007; Shen et al., 2012). Although its exact mode of action is unknown, it seems to act as a hub, facilitating physical interactions between components of the chromosome axis and the CO machinery (Perry et al., 2005; Tsubouchi et al., 2006). In diploid species, knockouts of this gene result in sterility, as failure of homologous COs at metaphase I leads to incorrect segregation. However, hexaploid wheat lacking TaZIP4- B2, only exhibits a small reduction in CO number, and still has fairly regular segregation. This is due to the four copies

of ZIP4 present in hexaploid wheat: one copy on each of the homoeologous group 3 chromosomes (TaZIP4-A1, TaZIP4- B1, and TaZIP4-D1) and a fourth copy on chromosome 5B (TaZIP4-B2). TaZIP4-B2 is a transduplication of a chromosome 3B locus (International Wheat Genome Sequencing Consortium [IWGSC], 2018) and most probably appeared within the Ph1 locus upon polyploidization. In wheat lacking Ph1 (and therefore TaZIP4-B2), ZIP4 copies on the homoeologous group 3 are still present, allowing CO formation, even if a small fraction occurs between non-homologous chromosomes. In the case of an allopolyploid species such as wheat, the process of homolog recognition is further complicated compared to diploids, by the presence of homoeologous chromosomes. We can hypothesize that upon polyploidization, the newly formed hexaploid wheat already had mechanisms in place for the meiotic sorting of homologs from homoeologs during the telomere bouquet. This would have provided the new allopolyploid with some fertility until improved by the transduplication of ZIP4 from 3B into the Ph1 locus, with further modification and stabilization of the meiotic process. Over time and evolution, the efficiency and stability of meiosis could be completely established. The transduplication of ZIP4 is an example of how the meiotic program could be modified in polyploids in general, and wheat in particular, during evolution. It also illustrates the requirement to study such processes directly in these crop species, where there is a potential for manipulation in breeding programs.

C-Ph1, which is a syntenic ortholog of the RA8 gene in rice, has also been proposed to contribute to the Ph1 effect on recombination (Bhullar et al., 2014). The RA8 gene in rice encodes an anther-specific BURP-domain protein expressed specifically in the tapetum, endothecium, and connective tissue, but not in pollen grains, starting from the tetrad stage and reaching the maximum level of expression at the late vacuolatedpollen stage (Ding et al., 2009). Thus, RA8 is suggested to play an important role in microspore development and dehiscence of anther (Jeon et al., 1999). Moreover, knockouts of this gene have been reported to induce male sterility (Patents WO2000026389 A3 and US20040060084). Expression profile analysis of all different copies of the C-Ph1 gene using data generated in the present study, 849 wheat RNA-seq samples now publicly available (Ramírez-González et al., 2018), and the meiotic RNA-seq libraries deposited at https://urgi.versailles.inra. fr/files/RNASeqWheat/Meiosis/ reveals that the C-Ph1 copy on 5D (rather than on 5B) is by far the most dominantly expressed. In addition, the newly generated wheat genome assembly reveals that the VIGS hairpin construct used for C-Ph1 silencing (Bhullar et al., 2014), and which yielded sterility phenotypes, was designed from the wheat expressed sequence tag (EST) homolog BE498862 (448 bp), which is 100% identical to the 5D gene copy and not the 5B copy. As for the expression pattern, the C-Ph1 copy on 5B shows no meiotic expression (<0.5 TPM), being mostly expressed afterward during pollen formation. The C-Ph1 copy on 5D is expressed during meiosis at a very low level (<4 TPM) compared to its expression in anthers during the heading stage (>1700 TPM), exhibiting a similar expression profile to the C-Ph1 ortholog in rice RA8. These observations explain why our deletion covering the 5B copy of C-Ph1 did not exhibit meiotic phenotype or sterility (Roberts et al., 1999; Al-Kaff et al., 2008), and suggest that C-Ph1 is actually involved in microspore development and dehiscence of anther. Finally, Ph1 is the dominant gene suppressing homoeologous CO within the wheat genome. Yet neither the presence of wild type C-Ph1, nor that of any other gene could suppress homoeologous CO induced by mutating ZIP4 on 5B (TaZIP4-B2), being the same level of homoeologous CO to that observed in ph1b deletion mutants (Rey et al., 2017, 2018a). We suggest that the use of the term C-Ph1 for this gene is therefore misleading and should be replaced with a more appropriate description.

### Global Gene Expression During Early Prophase Is Not Affected by Ph1

In the present study we used a total number of 21 different ph1b mutant plants, to assess whether global gene expression was affected by the absence of this locus during early prophase. We identified a set of genes which were differentially expressed in all samples lacking Ph1 compared to all samples containing Ph1. Only 358 genes were differentially expressed (0.56% of all expressed genes), of which 186 were located within the region corresponding to the 5B deletion, and 139 were located within the regions corresponding to the 3A and 3B deletions present in all Ph1 samples. Therefore, no major global changes in wheat expression were observed in wheat lacking Ph1 during early prophase. The effect of the CDK2-like genes inside the Ph1 locus on premeiotic replication and the associated effects on chromatin and histone H1 phosphorylation did not subsequently affect overall gene expression in early meiotic prophase. Moreover, the significant structural changes observed in the absence of Ph1 (centromere pairing and telomere dynamics during premeiosis, subtelomeric decondensation upon homologous recognition) were not associated with changes in global gene expression. This result is consistent with our previous conclusion that gene expression during meiotic early prophase is very stable and quite resilient to changes in chromatin structure.

### Wheat Lacking Ph1 Accumulate Extensive Chromosome Rearrangements

The mean number of COs in wheat lacking Ph1 (or TaZIP4- B2) was only 4–5 COs fewer than in wild type wheat (Martín et al., 2014; Rey et al., 2017). However, in the absence of Ph1, some COs can be formed between non-homologs, leading to non-homologous recombination and the accumulation of chromosome rearrangements. Most chromosomes synapse correctly in wheat lacking Ph1, as newly generated Ph1 deletion mutants exhibit low levels of multivalents in their meiocytes (Roberts et al., 1999). Thus, the Ph1 locus has only a slight effect on correcting synapsis. However, when Ph1 mutants are grown over multiple generations, they can accumulate extensive rearrangements. In the case of the newly generated TaZIP4- 5B CRISPR mutant, low levels of multivalents were present in metaphase I meiocytes (Rey et al., 2018a), suggesting that TaZIP4- 5B may contribute to the effect of promoting homologous synapsis. However, we can only verify the effect of TaZIP4-5B on recombination, being unable to confirm whether ZIP4-5B is wholly or partially responsible for the slight improvement

of homologous synapsis by the Ph1 locus (Rey et al., 2018a). It is possible that other genes could contribute to the slight improvement of homologous synapsis. The duplicated TaZIP4- 3B copy inserted into the CDK2-like locus, along with a heterochromatin segment. We previously reported that the CDK2-like genes were expressed in immature inflorescences (Al-Kaff et al., 2008), and that deletion of the 5B CDK2-like locus resulted in increased expression from copies on 5A and 5D, in particular one copy on 5D (Al-Kaff et al., 2008). However, we are not able to confirm these transcription results in the present study, as these CDK2-like genes, which seem to affect replication, are not expressed during the leptotene-zygotene meiotic stage. The closest Arabidopsis homolog of these CDK2-like genes has been reported to be involved in chromosome synapsis (Zheng et al., 2014). However, we do not have any proof of a direct effect of the CDK2-like genes on synapsis. We only have evidence that they seem to affect replication (Greer et al., 2012). A previous study also showed that ASY1 transcription and protein levels were clearly increased in their ph1b lines, which would affect synapsis (Boden et al., 2009). However, we do not observe such increases in ASY1 transcription in the present study, nor in the protein levels following immunofluorescence detection of ASY1 in meiocytes derived from our ph1b lines (Martín et al., 2014, 2017). The extensive genomic differences between the ph1b lines revealed in the present study mean that synapsis phenotypes attributed to the Ph1 locus need to be observed in multiple ph1b mutant lines.

The karyotypic instability in the absence of Ph1 has been previously reported (Sánchez-Morán et al., 2001), but the results obtained in the present work reveal that the intergenomic exchanges and deletions are higher than anticipated. The original ph1b mutant was obtained in 1977 (Sears, 1977) and since then, intergenomic exchanges and other reorganizations have probably been accumulating. The present study reveals that as well as the ph1b deletion on 5B, there are three further deletions in all ph1 mutant genotypes analyzed. This suggests that they probably arose soon after the original ph1b line was generated. Although our ph1b mutant lines have been routinely backcrossed to wild type wheat after eight generations, extensive rearrangements still accumulate subsequently, meaning that every single ph1b mutant could potentially be different. Thus, some of the effects previously attributed to the lack of the Ph1 locus are likely to be the result of these reorganizations. However, if a sufficient number of different ph1b mutant plants are used in any study, then this risk greatly decreases. In the present study, RNA seq samples were derived, and the data combined from 21 different ph1b mutant plants. In the future, particularly for breeding purposes, we recommend the use of the Tazip4-B2 TILLING mutant lines available at the UK Germplasm Resource Unit<sup>5</sup> (code W10348 and W10349). These lines do not currently exhibit rearrangements, but will probably also accumulate reorganizations in further generations, so we recommend checking and cleaning the lines periodically.

### Extreme Instability of Triticale Lacking Ph1

Octoploid triticale is the synthetic amphiploid resulting from the chromosome doubling of the hybrid between hexaploid wheat

We also assessed the consequence on meiosis of generating triticale in the absence of Ph1, the locus responsible for the diploid-like behavior of hexaploid wheat. As in the case of triticale containing Ph1, only plants with 56 chromosomes were selected for the RNA-seq analysis. The fertility of these triticale plants lacking Ph1 was extremely low, ranging from nine seeds to complete sterility. GISH analysis on both somatic and meiotic cells showed that unlike in triticale containing Ph1, there were extensive reorganizations resulting from nonhomologous recombination in the triticale lacking Ph1. Even though these triticale plants had only gone through three meiotic events since their synthesis, there were extensive recombination events between homoeologs. Chromosome fragmentation was also detected, particularly involving rye. There is a difference in heterochromatin DNA replication in the absence of the Ph1 locus in wheat–rye hybrids (Greer et al., 2012). If, as has been suggested, late DNA replication of rye heterochromatin is the cause of triticale instability in the presence of Ph1, the

and rye. It is therefore, a new allopolyploid species. Primary octoploid triticale (containing Ph1) is unstable meiotically, with variable frequency of univalents in metaphase I and reduced fertility (Scoles and Kaltsikes, 1974; Muntzing, 1979; Gustafson, 1982; Fominaya and Orellana, 1988; Lukaszewski and Gustafson, 2011). For our RNA-seq analysis, triticale plants with 56 chromosomes were selected, ensuring that they all had the complete set of wheat and rye chromosomes. GISH analysis was performed on the progeny of plants used in the RNAseq experiments. This analysis revealed that one third of the progeny were aneuploids, with rye chromosomes always being the cause of aneuploidy. Interestingly, no recombination or reorganization between homoeologs was observed in any of the plants analyzed, nor were any COs detected between homoeologs at meiotic metaphase I. This indicates that the origin of meiotic instability in octoploid triticale is most probably not related to the homologous recognition process, and that the presence of the Ph1 locus in the wheat genome plays the same role in this new species, ensuring only homologous recombination. It has previously been speculated that late DNA replication of rye heterochromatin interferes with chromosome synapsis when rye chromosomes are placed in a wheat genetic background (Thomas and Kaltsikes, 1974, 1976; Merker, 1976); however, there are also reports contradicting this hypothesis (Fominaya and Orellana, 1988). There is a clear decrease in CO number in triticale compared to wheat and rye, as revealed by the frequent occurrence of univalents, particularly of rye chromosomes. Replication initiation activates a checkpoint system that prevents DSB formation in unreplicated DNA. Therefore, it is possible that late DNA replication of the terminal rye heterochomatic knobs prevents some DSB formation and/or affects the DSB repair pathway of these late breaks, preventing COs. This may explain the frequent presence of rye univalents. In the future, when the rye genome sequence is completed, it would be interesting to compare the expression of specific meiotic genes involved in recombination between wheat, rye and triticale, checking whether meiotic expression of both wheat and rye is altered when both genomes are placed together in the same cytoplasm (as a new species).

<sup>5</sup>https://www.seedstor.ac.uk

additional delayed replication produced by the absence of Ph1 could affect excessive DSB formation, causing not only a decrease in CO formation but also lack of DSB repair, and therefore, chromosome fragmentation. In any case, triticale lacking Ph1 exhibits even more chromosomal rearrangements than wheat lacking Ph1, leading to an extreme phenotype and sterility.

### CONCLUSION

Understanding polyploidization is of great importance in the understanding of crop domestication, speciation, and plant evolution. One of the biggest challenges faced by a new polyploid is how to manage the correct recognition, synapsis and recombination of its multiple related chromosomes during meiosis, to produce balanced gametes. In the last few years, there has been a better understanding of the meiotic process from studies of diploid plants and other model organisms (Mercier et al., 2016). Polyploid crops have also benefited from these advances since many of the key genes and processes seem to be conserved between species. However, polyploids differ considerably from diploids in many respects. Hexaploid wheat, with its large genome size, high percentage of repetitive DNA and three related ancestral genomes, is likely to have modified the meiotic process in adapting to its polyploidy. Surprisingly, here we found no evidence for major changes in gene expression during early meiotic prophase, despite variations in synapsis, WGD or the absence of the Ph1 locus. This suggests that the transcription of genes required for early meiotic prophase has already occurred prior to this stage. Genetic studies in polyploids such as wheat have lagged far behind diploid species, partly because of the lack of key genetic resources. However, the release of the RefSeqv1.0 assembly in hexaploid wheat (International Wheat Genome Sequencing Consortium [IWGSC], 2018), the availability of expression data, (including that generated from the present study) presented in a browser www.wheat-expression.com (Borrill et al., 2016; Ramírez-González et al., 2018) enabling easy visualization and comparison of transcriptome data, and the availability of TILLING mutants for every wheat gene (Krasileva et al., 2017), will now allow more rapid progress to be made in our understanding of meiosis.

### AUTHOR CONTRIBUTIONS

AM, PS, and GM conceived and designed the study. AM obtained the hybrids and triticale, produced the meiotic RNA samples, did the cytological analyses, interpreted the results, and wrote the first draft of the manuscript. PB carried out the differential expression analysis, assisted with analysis of all data, and contributed corrections and suggestions. JH did the mapping using HISAT and created the chromosome coverage plots. AA carried out some data analysis and assisted with analysis of data. RR-G carried out the mapping using kallisto for the differential expression analysis and integrated the wheat analysis into the expVIP platform. PS and GM interpreted the data. CU contributed in corrections and suggestions. GM edited the manuscript. All authors have read and approved the final version of the manuscript.

### FUNDING

This work was supported by the UK Biotechnology and Biological Sciences Research Council (BBSRC), through a grant part of the Designing Future Wheat (DFW) Institute Strategic Programme (BB/P016855/1) and grant BB/J007188/1. Next-generation sequencing and library construction were delivered via the BBSRC National Capability in Genomics (BB/CCG1720/1) at Earlham Institute by members of the Genomics Pipelines Group.

### ACKNOWLEDGMENTS

This manuscript has been released as a pre-print at https://www. biorxiv.org/ (Martín et al., 2018).

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2018.01791/ full#supplementary-material

FIGURE S1 | Principal component analysis (PCA) of samples analyzed in this study. Three biological replicates were produced per genotype. (A) PCA for the six wheat samples, three containing the Ph1 locus (Ph1+) and three lacking it (Ph1−). The x and y axis represent the two principal components of the total variance, 73 and 12%, respectively. (B) PCA for 12 wheat–rye hybrid and triticale samples. Three hybrids containing and three lacking Ph1, and three triticale containing and three lacking Ph1. The x and y axis represent the two principal components of the total variance, 70 and 15%, respectively.

FIGURE S2 | Representation of the ratio of coverage along all chromosomes using Box plots. (A) Box plot comparing wheat–rye hybrids and triticale, both containing the Ph1 locus. (B) Box plot comparing wheat in the presence and absence of Ph1. (C) Box plot comparing wheat–rye in the presence and absence of Ph1. (D) Box plot comparing triticale in the presence and absence of Ph1. Arithmetic mean values of the coverage ratio per chromosome are indicated on the upper part of the plots. Mean values >0.05 and <−0.05 are highlighted in magenta.

FIGURE S3 | Chromosome coverage plots of wheat containing Ph1 (Ph1+) (three samples pooled together) vs. each individual sample of wheat lacking Ph1 (Ph1−). Heatmaps show that each wheat sample lacking Ph1 is different. Several deletions (visualized in dark blue) are common to all three samples, but other deletions and chromosomes rearrangements are different between them.

FIGURE S4 | Chromosome coverage plots of triticale containing Ph1 (Ph1+) (three samples pooled together) vs. each individual sample of triticale lacking Ph1 (Ph1−). Heatmaps show that each triticale sample lacking Ph1 is different. Several deletions (visualized in dark blue) are common to all three samples, but other deletions and chromosomes rearrangements are different between them.

FIGURE S5 | Morphology of whole plants (A) and spikes (B) of triticale containing the Ph1 locus (Ph1+) and lacking it (Ph1−). Plant and spike morphology of all triticale containing Ph1 was perfectly normal; however, every triticale lacking Ph1, was morphologically different, some exhibiting very abnormal phenotypes.

TABLE S1 | Number of cleaned reads generated and mapped for each sample. The RNA-seq data were aligned to the RefSeqv1.0 assembly using HISAT with

strict mapping options to reduce the noise caused by reads mapping to the incorrect regions.

TABLE S2 | Number of reads generated and mapped for each sample using Kallisto. (A) Wheat samples were pseudoaligned against the Chinese Spring RefSeqv1.0+UTR transcriptome reference. (B) Wheat–rye hybrids and triticale samples were pseudoaligned against a wheat+rye transcriptome constructed in silico by combining the Chinese Spring RefSeqv1.0+UTR transcriptome reference with the published rye transcriptome (Bauer et al., 2016).

TABLE S3 | Number of DEG among samples using different thresholds in the presence (Ph1+) and absence (Ph1−) of the Ph1 locus. (A) Number of wheat DEGs. (B) Number of rye DEGs. The first number in every column title represents the p-adj filter (>0.05, >0.01 or >0.001). The 2FC indicates that genes were up or down-regulated over twofolds.

TABLE S4 | Genes differentially expressed among samples and total number of expressed genes (EG) in this study. (A) High confidence (HC) wheat genes. (B) Low confidence (LC) wheat genes. (C) Rye genes.

### REFERENCES


TABLE S5 | (A) Gene ontology (GO) classification of DEGs up-regulated in wheat–rye hybrids vs. triticale (both containing the Ph1 locus). (B) Gene ontology (GO) classification of DEGs down-regulated in wheat–rye hybrids vs. triticale (both containing the Ph1 locus). (C) GO Slim classification of DEGs down-regulated in wheat–rye hybrids vs. triticale (both containing the Ph1 locus). This list was created to have a broad overview of the ontology content without the detail of the specific fine-grained terms.

TABLE S6 | (A) Functional annotation of DEGs up-regulated in wheat–rye hybrids vs. triticale (both containing Ph1). (B) Functional annotation of DEGs down-regulated in wheat–rye hybrids vs. triticale (both containing Ph1).

TABLE S7 | Functional annotation of the 33 DEGs shared by all samples lacking Ph1 vs. all samples containing Ph1, and which are not located in any of the common deletions present in all samples lacking Ph1.

TABLE S8 | Chromosome configuration of nine newly synthesized triticale both containing the Ph1 locus (Ph1+) and lacking it (Ph1−). No inter-genomic recombination was detected in the presence of Ph1.



Muntzing, A. (1979). Triticale: results and problems. Adv. Plant Breed. 10, 1–103.


Naranjo, T., Roca, A., Giraldez, R., and Goicoechea, P. G. (1988). Chromosome pairing in hybrids of ph1b mutant wheat with rye. Genome 30, 639–646. doi: 10.1139/g88-108


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Martín, Borrill, Higgins, Alabdullah, Ramírez-González, Swarbreck, Uauy, Shaw and Moore. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# High Resolution Genetic and Physical Mapping of a Major Powdery Mildew Resistance Locus in Barley

Parastoo Hoseinzadeh<sup>1</sup> , Ruonan Zhou<sup>1</sup> , Martin Mascher<sup>1</sup> , Axel Himmelbach<sup>1</sup> , Rients E. Niks<sup>2</sup> , Patrick Schweizer<sup>1</sup> and Nils Stein1,3 \*

<sup>1</sup> Department of Genebank, Leibniz Institute of Plant Genetics and Crop Plant Research, Gatersleben, Germany, <sup>2</sup> Department of Plant Science, Plant Breeding, Wageningen University & Research, Wageningen, Netherlands, <sup>3</sup> Department of Crop Sciences, Center for Integrated Breeding Research, University of Göttingen, Göttingen, Germany

#### Edited by:

Pierre Sourdille, INRA Centre Auvergne Rhône Alpes, France

#### Reviewed by:

Yong Zhang, Chinese Academy of Agricultural Sciences, China Li Huang, Montana State University, United States Ernesto Igartua, Spanish National Research Council (CSIC), Spain

> \*Correspondence: Nils Stein stein@ipk-gatersleben.de

#### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 21 November 2018 Accepted: 28 January 2019 Published: 14 February 2019

#### Citation:

Hoseinzadeh P, Zhou R, Mascher M, Himmelbach A, Niks RE, Schweizer P and Stein N (2019) High Resolution Genetic and Physical Mapping of a Major Powdery Mildew Resistance Locus in Barley. Front. Plant Sci. 10:146. doi: 10.3389/fpls.2019.00146 Powdery mildew caused by Blumeria graminis f. sp. hordei is a foliar disease with highly negative impact on yield and grain quality in barley. Thus, breeding for powdery mildew resistance is an important goal and requires constantly the discovery of new sources of natural resistance. Here, we report the high resolution genetic and physical mapping of a dominant race-specific powdery mildew resistance locus, originating from an Ethiopian spring barley accession 'HOR2573,' conferring resistance to several modern mildew isolates. High-resolution genetic mapping narrowed down the interval containing the resistance locus to a physical span of 850 kb. Four candidate genes with homology to known disease resistance gene families were identified. The mapped resistance locus coincides with a previously reported resistance locus from Hordeum laevigatum, suggesting allelism at the same locus in two different barley lines. Therefore, we named the newly mapped resistance locus from HOR2573 as MlLa-H. The reported co-segregating and flanking markers may provide new tools for marker-assisted selection of this resistance locus in barley breeding.

Keywords: barley, high resolution mapping, powdery mildew, resistance locus, RLK

### INTRODUCTION

Cultivated barley (Hordeum vulgare ssp. vulgare L.), is ranked fourth after rice (Oryza sativa L.), wheat (Triticum aestivum L.) and maize (Zea mays L.) in terms of crop production. The prevalent use of barley is as a source of feed and forage for livestock, and as source for food and beverages for humans (Ullrich, 2010; Newton et al., 2011). According to FAO reports on global trade of barley and barley products, more than 20 million tons of barley grains are exported and imported annually worldwide, accounting for about US\$3 billion. Nevertheless, losses due to pests and diseases in cereals continue to pose a substantial threat to agricultural food and feed production and impact economic decisions as well as practical developments. A cost effective and environmentally sustainable strategy to mitigate the damage and losses caused by plant pathogens is to deploy plant varieties possessing genetic resistance (Johnston et al., 2013). Unlocking genetic diversity in genebank collections is of prime importance to discover and deploy genetic resistance genes or alleles that have been lost during domestication and intensive selection in breeding

programs (McCouch et al., 2013). The primary, freely crossable, gene pool of barley consists of cultivated barley including landraces and its direct wild relative H. vulgare ssp. spontaneum and provides a source of still un-used and valuable disease resistance alleles. In this regard, resistance phenotyping of barley genetic resources preserved in ex situ collections is a mean to identify the genetic basis of resistance and introduce it into modern barley cultivars.

Powdery mildew caused by Blumeria graminis f. sp. hordei (Bgh) is a foliar disease of barley with worldwide importance (Glawe, 2008). The relatively cool and humid climate of Europe fosters the spread of powdery mildew, making it the most prevalent European barley disease (Jørgensen and Wolfe, 1994) leading to yield losses of up to 30% and lower grain quality (Corrion and Day, 2001; Czembor, 2002). Until now, some of the previously identified powdery mildew resistance loci in barley have been exploited by plant breeders to develop resistant cultivars. In fact, all seven barley chromosomes harbor important powdery mildew resistance loci and still novel genes are continually being located to its chromosomes (Repková et al., ˇ 2006). To increase the durability for barley powdery mildew resistance, breeders are continuously looking for new monogenic as well as polygenic resistance sources derived from diverse barley germplasm to improve resistance through gene pyramiding.

Previous QTL mapping studies conducted on barley powdery mildew resistance have suggested that the telomeric region of barley chromosome 2H represents an important genomic region for mildew resistance. Several significant QTLs near the distal end of this chromosome have been repeatedly been reported to be associated with powdery mildew resistance (von Korff et al., 2005; Marcel et al., 2007; Aghnoum et al., 2010; Schweizer and Stein, 2011). However, the confidence interval (cM) of the identified powdery mildew resistance QTLs in this region, regardless of rather large mapping population size (∼110–200 individuals) and high recombination frequency in this region (≤1.1 Mb/cM) according to Künzel et al. (2000); IBSC (2012), and Mascher et al. (2017), was too large to allow map-based cloning (St. Clair, 2010). In fact, the availability of sufficient number of SNPs was a limiting factor in marker development. The recent progress in next-generation sequencing (NGS) technologies provides the possibility of cost-effective high-throughput de novo SNP discovery within the genome and parallel genotyping (Deschamps and Campbell, 2010). Indeed, multiple individuals can be rapidly sequenced with low cost and the detected SNPs can easily be converted into individual molecular markers for further application or directly used in high-density linkage map construction (Ruperao and Edwards, 2015). However, for crops with medium to large genomes, where much of the sequence is repetitive and the proportion of gene space is limited, a reducedrepresentation strategy like genotyping-by-sequencing (GBS) is a cost effective approach to discover thousands of SNPs that can be directly used for high density linkage map construction, precise localization of the QTLs and further marker development (Elshire et al., 2011; He et al., 2014).

The diploid nature of barley (2n = 14) with high degree of inbreeding along with the ease of making genetic crosses made barley a favorable biological model for genetic and genomic studies (Saisho and Takeda, 2011). Consequently, comprehensive barley genomic resources have been developed to facilitate the analysis of the barley genome during the last two decades. The recently published barley reference-quality genome sequence (Beier et al., 2017; Mascher et al., 2017) and a variety of newly developed web-based tools providing barley genomic data (Colmsee et al., 2015) have facilitated many downstream applications in gene identification and isolation like positional gene cloning and comparative genomic analysis with other Triticeae (Mascher et al., 2017).

The main objective of the present study was to perform high resolution genetic and physical mapping of a major resistance locus segregating in the recombinant inbred line (RIL) population 'HOR2573 × Morex.' This was achieved through applying next generation sequencing-based strategies and taking advantage of the improved barley genomic resources infrastructure.

### MATERIALS AND METHODS

### Plant Material and Phenotyping

An Ethiopian landrace 'HOR2573,' resistant to seven highly virulent powdery mildew isolates [three European (78P, D12-12, and CH4.8) and four Israeli (35, 69, 148, and 289) isolates, **Supplementary File 1**], was previously crossed to a six-rowed malting cultivar 'Morex' (susceptible to all the tested Bgh isolates). The phenotyping of the parental lines was performed through both detached leaf assay (seedling stage) and whole plant assay at field, revealing a strong correlation between resistance data from both assays (Spies et al., 2012).'HOR2573' responds to all the tested Bgh isolates by hypersensitive cell death in leaves detached from 2-week old plants. The Swiss field isolate, CH4.8, among the European isolates showed a decreased infection (less than 5% leaf area covered by colonies) – the same as the four Israeli isolates – on HOR2573, thus was selected for the further study. Ninety-five F6-RILs, derived by single-seed-descent (SSD) through five cycles of selfing from the cross of 'HOR2573 × Morex,' were used for phenotyping and genetic mapping in three independent experiments. Within each experiment, eight seeds per RIL (F6:7) were sown as eight biological replicates. Plants were phenotyped 14 days after sowing using the second seedling leaf in a detached leaf assay. For this purpose, the plants were grown in trays at 17–20◦C under long day conditions (16 h) in the greenhouse. The middle part of the second leaf was cut into two 3 cm long pieces (technical replicates). Detached leaves were placed surfaces upward in four-column plates on water agar (1%) containing benzimidazole (40 mg/l) as senescence inhibitor. In each column of one plate, five RILs were located in randomized block design in combination with both positive (susceptible parent) and negative (resistant parent) controls. The prepared leaf segments were inoculated under the inoculation tower through blowing Bgh conidia (isolate CH4.8) of the sporulating leaves (from four sides) according to Altpeter et al. (2005), receiving final spore densities 20–30 conidia per mm<sup>2</sup> . The inoculated detached leaves were kept in the incubator growth chamber at 20◦C, 60% humidity, 16 h light period and scored macroscopically at 7 days post

inoculation (dpi). The disease intensity was rated based on infection area (%) according to Mains and Diktz (1930) and Kølster et al. (1986). Based on the infection area, the rating scores were finally grouped into two groups of resistant (classes 1 and 2, with less than 25% leaf infection area) and susceptible (classes 3 and 4, leaf infection area ≥25%) plants.

### Preparation of Genomic DNA

Plant material for DNA extraction was grown under standard greenhouse conditions (16 h day/8 h night, 20◦C). Young third leaves were sampled and immediately transferred into liquid nitrogen. Genomic DNA was extracted using guanidine thiocyanate-based DNA isolation in 96-well plate format according to Milner et al. (2018). The DNA concentration of the samples was measured using Qubit <sup>R</sup> 2.0 Fluorometer (Invitrogen, Carlsbad, CA, United States) according to the manufacturer's protocol. For accurate DNA quantification of higher number of samples, Quant-iTTM PicoGreen <sup>R</sup> dsDNA assay kit (Invitrogen, Carlsbad, CA, United States) and a Synergy HT microplate reader (BioTek, Bad Friedrichshall, Germany) were used.

### Marker Development and Primer Design

The SNPs between resistant and susceptible genotypes identified through GBS assay, were converted into Cleaved Amplified Polymorphic Site (CAPS) markers using SNP2CAPS software (Thiel et al., 2004). Primers used for marker development were designed using the online software Primer3 v. 0.4.0<sup>1</sup> (Koressaar and Remm, 2007; O'Halloran, 2015). Default parameters were used with minor modifications. Guanine-cytosine content (GC-content) was set within the range of 50–55% and the product size was adjusted according to the experimental requirement between 300 and 1,000 bp. The primer length was set between 19 and 21 bp and primer melting temperature (Tm) was adjusted around 60◦C. The restriction digestion reaction was performed according to manufacturer recommendations using a thermocycler for incubation. DNA fragments were separated on a 1.5% agarose gel for genotyping.

### PCR Amplification and Sanger Sequencing

The DNA amplification was performed on GeneAmp PCR Systems 9700 (Applied Biosystems, Darmstadt, Germany) using a standardized touchdown PCR profile with HotStarTaq DNA Polymerase (Qiagen, Hilden, Germany): initial denaturation for 15 min at 95◦C, followed by four cycles of denaturation at 95◦C/30 s, annealing at 62◦C/30 s (decreasing by 1◦C per cycle), extension at 72◦C/60 s); then 35 cycles denaturation at 95◦C/30 s, annealing at 58◦C/30 s, extension at 72◦C/60 s; followed by a final extension step at 72◦C/7 min. Based on amplicon length, the extension time was modified (1 min/1 kb). The PCR products were resolved by 1.5–2.5% gel-electrophoresis depending on amplicon size. PCR products were purified using the NucleoFast 96 PCR Kit (Macherey-Nagel, Germany) and sequenced using BigDye Terminator chemistry (BigDye <sup>R</sup> Terminator v3.1, Applied Biosystems, Darmstadt, Germany) on the 3730xl DNA Analyzer (Applied Biosystems, Carlsbad, CA, United States). Sequence analysis was performed using 'Sequencher 4' software (Genecodes Corporation, United States). The identified SNPs between resistant and susceptible genotypes were converted into CAPS markers according to procedures previously described in section marker development and primer design.

### GBS Library Preparation and Data Analysis

All 95 F<sup>6</sup> RILs and the two parental genotypes 'HOR2573' and 'Morex' were pooled per lane in an equimolar manner and sequenced on the Illumina HiSeq 2500, 1×107 cycles, single read, using a custom sequencing primer following previously established procedures (Mascher et al., 2013; Wendler et al., 2014). Prior to library preparation, genomic DNA was quantified using PicoGreen (Invitrogen, Carlsbad, CA, United States) and normalized to 20 µl of 10 ng/µl (200 ng total) in 96-well plates. For quality control of DNA, the GBS library was analyzed with an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, United States) using the Agilent High Sensitivity DNA kit. Finally, the quantification control of the library was performed using qPCR according to Mascher et al. (2013).

The genotype calls from the sequencing data were filtered in order to select only SNPs matching the default criteria. The default parameters were defined for a RIL population by Mascher et al. (2013), considering the expected residual heterozygosity of 1–2% in the population presented in this study. In total, 46,689 and 15,798 SNPs were obtained genome-wide at minimum sequence read coverage of two- and six-fold, respectively. Furthermore, to reduce the computational errors in JoinMap <sup>R</sup> 4.0, SNPs with more than 10% missing data were excluded from further analysis. This approach delivered 10,644 genome-wide SNPs at minimum two-fold read coverage. Of these, 1,843 SNPs were located on chromosome 2H. To make variant calls with a higher degree of confidence, a set of 1,394 genome-wide SNPs with robust variant calls (six-fold read coverage) were utilized to construct a genetic linkage map.

From GBS data, three plants were identified harboring a heterozygous region covering the resistance locus meaning that progeny of each of these plants segregate for the locus interval representing as a heterogeneous inbred family (HIF) according to Tuinstra et al. (1997). Heterozygosity of these three selected HIFs (HIF145, HIF567, HIF836) for the respective region was re-evaluated based on number of alternative allele coverage out of the total read coverage, confirming that selected plants were heterozygous for the target interval. The progeny of these three HIFs was used in the following for high resolution mapping of the locus according to Tuinstra et al. (1997). Consequently, an additional segregating F7:<sup>8</sup> population consisting of 940 plants (the progeny of the identified heterozygous recombinants) was screened using flanking and co-segregating markers of the targeted interval to identify additional recombinants and increase the genetic resolution in the vicinity of the targeted locus.

<sup>1</sup>http://bioinfo.ut.ee/primer3-0.4.0/

### Genetic Linkage Analysis

fpls-10-00146 February 12, 2019 Time: 17:51 # 4

JoinMap <sup>R</sup> 4.0 software (Van Ooijen, 2006) was used for genetic linkage analysis following the instructions manual. Only genotype calls with at least six-fold read coverage were included to construct the genetic linkage map. A linkage map was produced by regression mapping using the Kosambi function. Markers were assigned to seven linkage groups based on Logarithm of Odds (LOD: >5) groupings and the linkage groups were assigned to barley chromosomes on the basis of the locus coordinates determined during read mapping against the barley reference genome assembly (Mascher et al., 2017).

### Statistics of the Phenotypic Analysis

The three independent phenotyping experiments were treated as three environments. The phenotypic data analysis was performed using the software ASReml-R 3.0 (Butler et al., 2009). The mean infection area in each experiment (considered as environment) was used to calculate the best linear unbiased estimates (BLUEs) with the following model:

$$y\_{ijmno} = \mu + g\_i + l\_o + (gl)\_{io} + s\_{jo} + p\_{jmo} + c\_{jmno} + e\_{ijmno},$$

Where yijmno is the phenotypic performance of i th genotype in n th column of mth plate in j th inoculation tower of o th environment, µ is the intercept, g<sup>i</sup> is the effect of i th genotype, l<sup>o</sup> is the effect of o th environment, (gl)io is the interaction between i th genotype and o th environment, sjo is the effect of j th inoculation tower in o th environment, pjm is the effect of mth plate in j th inoculation tower of o th environment, cjmn is the effect of n th column in mth plate of j th inoculation tower in o th environment, and eijmno is the error of yijmno. For BLUEs estimation, only µ and g<sup>i</sup> were treated as fixed effects and for heritability estimation, all the effects were treated as random except µ.

The broad-sense heritability can be calculated with the following equation:

$$h^2 = \frac{\sigma\_{\mathcal{g}}^2}{\sigma\_{\mathcal{g}}^2 + \frac{\sigma\_{\mathcal{g}'}^2}{N r.en\nu} + \frac{\sigma\_{\mathcal{e}}^2}{N r.en\nu^\* N r.rep}}$$

### QTL Analysis

The QTL analysis was performed using GenStat v16 software (VSN International, Hemel Hempstead, Hertfordshire, United Kingdom). An initial genome-wide scan was carried out by simple interval mapping (SIM) to obtain candidate QTL positions. These were used as cofactors in subsequent scans (composite interval mapping, CIM). One or more rounds of CIM were performed, implying a genome-wide scan for QTL effects in the presence of cofactors, which were usually potential QTL positions detected at previous steps. Following backselection from a set of candidate QTL, a final set of estimated QTL effects was obtained. The LOD significance threshold (α = 0.05) was estimated by 1,000 permutation tests. The 95% confidence interval was taken to be the chromosomal region where the LOD score has dropped less than one from the linkage peak (Sinha et al., 2009).

## Re-annotation of the Resistance Locus Interval

The automated annotation of the barley reference sequence (Mascher et al., 2017) might contain inaccuracies and could have missed genes, thus, the annotation of the MlLa-H interval was reassessed. For this purpose, the unique sequences of the target region were extracted using the Kmasker-web tool<sup>2</sup> . The re-annotation of the non-redundant sequences relied on nucleotide similarity search using BLASTN against the non-redundant DNA/protein database of NCBI. Moreover, the disease resistance genes identification approach published by Jupe et al. (2013) was performed using the established descriptive amino acid motifs in the motif alignment and search tool (MAST) to predict sequences with a motif composition similar to disease resistance analogs (RGAs).

In addition, some of the published annotated genes might miss sequences residing in start and stop codon regions. Therefore, to identify the putative start and stop codons for these gene models located in the interval where the MlLa-H gene resides, the protein sequence of each gene model was used to perform protein similarity search using BLASTP against the non-redundant protein sequence (nr) database. The protein sequence of the best hit from one of the closet species (rice, bread wheat, and Aegilops tauschii) was selected for alignment using TBLASTN against the barley reference genome to get the corresponding physical coordinates. Based on physical coordinates, as well as predicted open reading frames, the putative start and stop codons as well as exon and intron regions were determined. This allowed us to obtain the complete coding sequences of all the published annotated genes in the delimited target interval (Mascher et al., 2017).

### RESULTS

### Phenotypic Data Analysis

In all three phenotyping experiments, the resistant parent 'HOR2573' always had the highest resistance score (≤2.5% leaf infection area, class 0) whereas maximum susceptibility was always recorded for 'Morex,' the susceptible parent (≥80% leaf infection area, class 3) documenting high inoculation/infection efficiency. The broad sense – Heritability for powdery mildew resistance was higher than 98% in all three independent phenotyping experiments, indicating that most of the phenotypic variation was genetically determined. Significant correlations were observed among all three phenotyping experiments with r between 0.91 and 0.94 in each couple of experiments, indicating high infection efficiency in all the three experiments.

### Genotyping of the RIL Population and Construction of Genetic Linkage Map

The reduced set of 1,394 SNPs (six-fold read coverage) was used to construct a genetic linkage map with seven linkage groups (LOD = 5.0) comprising between 154 (1H)

<sup>2</sup>http://webblast.ipk-gatersleben.de/kmasker/

and 269 (5H) markers, which were distributed evenly on each chromosome. The marker density varied from 1.1 for chromosome 4H (137 SNPs/119.7 cM) to 1.9 for chromosome 2H (252 SNPs/134.4 cM). The accuracy of the genetic linkage map was checked through the observed consistency between genetic marker positions and their respective physical order in the reference genome sequence (Mascher et al., 2017). The genetic map length ranged between 119.7 cM (4H) and 171.8 cM (7H) per chromosome, respectively, with a total map length of 1,000 cM, which is in the similar range as reported for other genetic maps of barley (Stein et al., 2007; Close et al., 2009; Mascher et al., 2013).

### QTL Mapping for Powdery Mildew Resistance

Linkage analysis for single trait in single/multiple environment(s) for both Interval Mapping and CIM methods revealed the same QTL with LOD peaks of 48, 53, and 46 on the long arm of chromosome 2H for all three environments, respectively (**Figure 1**).

The QTL interval was stable across all environments explaining an average of 73.3% of the phenotypic variance in the first, 74.7% of the phenotypic variance in the second and 71.4% of the phenotypic variance in the third environment (**Table 1**). The detected single major effect QTL was assigned to an interval of 3 cM with 95% confidence flanked by markers M238 and M252.

The strength and the effect of the identified QTL on phenotypic variation suggested that the powdery mildew resistance from 'HOR2573' was most likely controlled by a single major gene. To validate this possibility, disease scoring was re-performed with two qualitative classes (resistant vs. susceptible class) independently from the previous phenotyping scores in order to obtain unbiased results. Based on the qualitative evaluation, 51 out of 95 RILs were consistently scored as resistant whereas 44 RILs were scored as susceptible plants indicating the inheritance of a monogenic Mendelian factor [1:1, X <sup>2</sup> = 0.5156 < 3.841 at the certainty level of (1 − P-value = 0.95) with the degrees of freedom (d.f. = 1)]. These results corroborated the presence of a single major effect dominant locus/gene controlling powdery mildew resistance in the population 'HOR2573' × 'Morex.'

The long arm of chromosome 2H is known to carry the 'Laevigatum' resistance locus (known as MlLa) (Marcel et al., 2007). To assess the possibility of an overlap between the QTL detected for 'HOR2573' × 'Morex' and the MlLa locus, flanking and co-segregation genetic markers were used for BLAST searches against the barley reference genome. All MlLa locus associated markers (WBE142, WBE138, MWG2200, WBE141, and WBE145) could be anchored to the M238-M252 interval (**Figure 2**), suggesting that either two independent physically closely linked genes or alleles of the same locus might explain powdery mildew resistance in 'Vada' (derived from 'Laevigatum') and in 'HOR2573.' To point out this coincidence clearly, we propose to name the 'HOR2573' mildew resistance locus MlLa-H, indicating that the resistance-conferring putative MlLa allele of this study was derived from the Ethiopian landrace 'HOR2573.'

## High Resolution Genetic Mapping of the 2HL Resistance Locus

For high resolution genetic mapping of the resistance locus 1,001 progeny plants of three F6 HIFs were used. The resistance evaluation of HIF-population with the same Bgh isolate, CH4.8, resulted in the identification of 742 resistant and 259 susceptible plants, consistent with the segregation of a single dominant gene (3:1, X <sup>2</sup> = 0.407 < 2.706 and P-value = 0.1; degrees of freedom d.f. = 1). This segregation pattern was also evaluated individually in each HIF family (HIF145, HIF567, HIF 836), verifying the monogenic dominant inheritance of the MlLa-H locus (**Table 2**).

For genotyping of 1,001 plants, three CAPS markers (M3, M8, and M7 in order of appearance in this interval, see **Figure 3A**) were developed by taking advantage of GBS-derived SNPs within the locus interval. A total of 141 recombinants were identified between the three selected markers (**Figure 3B**) placing the resistance locus between M8 and M7, in close proximity to M7. Further marker saturation reduced the interval to 1.1 Mbp (10 recombination events), flanked by markers M27 and M31 (**Figure 3C**). An additional 940 progeny plants of the identified heterozygous recombinants were screened by utilizing the closest flanking markers (M27 and M31) plus two markers previously cosegregating with the resistance locus. This identified additional 11 recombinants for this interval. Eventually, the target interval was reduced to 850 kb, flanked by marker G2x\_4 and M14\_22, containing only two recombinants at either side of the resistance locus which was co-segregating with a cluster of seven markers (**Figure 3D**). The sequence information of all key markers can be found in **Supplementary File 2**.

### Identification of Candidate Genes for the MlLa-H Interval

The attempts of re-annotation of the MlLa-H interval in the 'Morex' reference sequence confirmed the automated annotation of the barley reference sequence (Mascher et al., 2017) and the presence of no additional genes/ORFs in the corresponding region. Based on automated annotation of the MlLa-H interval, seven high confidence (HC) genes (HORVU2Hr1G126250, HORVU2Hr1G126290, HORVU2 Hr1G126350, HORVU2Hr1G126380, HORVU2Hr1G126440, HORVU2Hr1G126510, and HORVU2Hr1G126540) are located within the genetically delimited target interval. In addition, the structural annotation of the seven genes predicted for the MlLa-H interval was validated through sequence comparisons to putatively orthologous genes from closely related species. This showed that the automated gene annotation for four gene models (HORVU2Hr1G126290, HORVU2Hr1G126380, HORVU2Hr1G126510, and HORVU2Hr1G126540) had incomplete coding sequences in the automated annotation and thus those were revised accordingly.

Powdery mildew resistance conferred by the MlLa-H locus, derived from 'HOR2573,' is dominantly inherited, involves a hypersensitive response-like programmed cell death at a microscopic level, thus it is most likely race-specific. This pattern of resistance may suggest the involvement of a gene belonging either to the Nucleotide Binding Site

Leucine Rich Repeat (NBS-LLR) or Receptor Like Kinase (RLK) gene families, therefore it was anticipated that the delimited target interval (∼850 kb) of the MlLa-H locus would contain candidate genes belonging to the expected classes of resistance genes. From the seven HC genes, four gene (HORVU2Hr1G126250, HORVU2Hr1G126380, HORVU2Hr1G126440, and HORVU2Hr1G126510) in the delimited target interval were RGAs, one gene (HORVU2Hr1G126250) belongs to RLK gene family and the rest to the NBS-LLR family (**Figure 4**). In context of

dominant race-specific resistance this qualified each of them as candidate genes for the MlLa-H locus.

Based on the assigned functional annotation of genes predicted in the barley reference sequence and publicly available transcriptome and gene expression profiling data from different plant tissues in barley (IBSC, 2012), none of the three remaining gene models (HORVU2Hr1G126290, HORVU2Hr1G126350, HORVU2Hr1G126540) is suggested of having a role in resistance to plant pathogens or plant/pathogen interaction; therefore, they were excluded for any further analysis. As detailed information,



<sup>1</sup> 95% confidence interval is supported by LOD = 25. <sup>2</sup>The physical coordinates of the 95% confidence interval flanked by markers M238 and M252 on barley reference genome: 762,829,007 and 766,311,171 bp, respectively. <sup>3</sup>The negative value of additive effect means that a single allele of resistant parent 'HOR2573' decreases the infection area according to the value. <sup>∗</sup>The remaining 27% of phenotypic variation which is not explained by this QTL might be due to random error, all other nongenetic variation (like unavoidable inaccuracies in estimating % leaf infection severity, the position of leaves on the plates, a bit heterogeneous inoculum deposition) and other genetic variation, like statistically non-significant QTLs that did not pass the threshold line for declaring statistical significance of a QTL which is obtained through permutation tests.

HORVU2Hr1G126290 is an uncharacterized protein. Its protein sequence similarity search against the orthologous genes in other species indicated that it is also uncharacterized in other species. In addition, a survey on publicly available transcriptome and gene expression profiling data in barley (IBSC, 2012) showed that the HORVU2Hr1G126290 is only expressed in tissues taken from developing grains, palea and rachis, suggesting that it does not play a role in regulating disease resistance. Likewise, the survey on publicly available gene expression data for HORVU2Hr1G126540 (homology with Amidase superfamily) showed that it is also highly expressed in developing grains, presenting no role in plant immune responses. The third gene model, HORVU2Hr1G126350 has high homology with SCAR family, being involved in plant cell morphogenesis such as controlling cell division and elongation. Therefore, based on the functional annotation or the transcript evidence, these three genes can be ruled out as candidates.

### Re-sequencing of Candidate Genes in 'HOR2573' and 'Morex'

The four RGAs detected at the MlLa-H locus were all promising candidates for the resistance gene. Sequence comparison in resistant (MlLa-H) and susceptible (mlLa-H) genotypes should provide help prioritizing among the four candidate genes. In



H<sup>0</sup> = 3 (number of resistant plants): 1 (number of susceptible plants). H1 = H0 is false, P-value = 0.1 and d.f. = 1; the calculated χ 2 value is too low to reject the hypothesis.

fact, the presence of a polymorphism correlated to the phenotype in a gene re-sequenced from resistant parent, can make it a good candidate, whereas either the absence of non-synonymous polymorphisms or the emergence of premature stop codons in each RGA in 'HOR2573' compared to 'Morex' is a clue to reject it as a candidate gene.

All four genes could be amplified in their entire length from both 'Morex' and 'HOR2573.' The re-sequencing from susceptible parent 'Morex' always confirmed the sequence information of the published reference sequence. The sequence comparison of the first RGA (HORVU2Hr1G126250) between parental lines revealed 20 SNPs, including 4 synonymous and 16 non-synonymous SNPs, leading to amino acid changes in both LRR and kinase domains (**Figure 5A**). The re-sequencing result of the second RGA (HORVU2Hr1G126380) from resistant parent's genome 'HOR2573' showed that a 42 bp deletion plus a 53 bp insertion occurred in different parts of the second exon, corresponding to the LRR domain. These deletion and insertion led to a premature stop codon (**Figure 5B**) and a probable loss of function of the domain, suggesting it represents a pseudogene in the resistant genotype. The comparison for the third RGA, HORVU2Hr1G126440, revealed the presence of two paralogs of this gene in 'HOR2573,' the resistant genotype. One copy is 100% identical to the 'Morex' allele (not shown

FIGURE 3 | High resolution mapping of the powdery mildew resistance locus MlLa-H. (A) The genomic region containing the MlLa-H locus identified through QTL analysis is shown in blue. (B) The identification of 141 recombinants and mapping of the MlLa-H locus between M8 and M7. The green box presents the number of recombinants between M8 and M7, of which 74 occurred between marker and phenotype, 20 recombinants occurred from heterozygous to homozygous resistance. (C) The reducing of target interval to 1.1 Mbp and remaining 10 recombinants at this interval. (D) Identification of additional 11 recombinants through screening of additional 940 F2-like individuals by the flanking markers M27, M31 plus co-segregating markers (M21 and M25). (D) Narrowing down of the target interval to 850 kb. In each step the flanking markers are highlighted in red. The physical distance between two flanking markers is written in dark blue box. The co-segregating markers with the phenotype (the target locus is shown in pink) are highlighted in green. The number of recombination events between markers is shown below the black line which presents the barley reference genome. The identified additional 11 recombinants are highlighted in orange.

(the barley genome reference and as the susceptible parent in this study) is represented in black (exon) and white (untranslated regions) boxes. The distance between boxes represents the introns. The size of each gene is written in blue boxes above of each gene model. The corresponding protein domains are written below exons.

in the **Figure 5**) and likely represented the orthologous gene. The other copy (putative paralog) showed several SNPs in the exons plus a single bp insertion in the first exon and two consecutive nucleotide changes in the second exon predicted to induce a frame-shift, thus, the HORVU2Hr1G126440 paralog of 'HOR2573' likely represents a pseudogene. Furthermore, the comparison of the predicted protein sequences from resistant and susceptible parents suggested that due to the frame shift, the LRR domain was absent, leading to loss of function status of this gene model in 'HOR2573' (**Figure 5C**). Two paralogs were also observed in 'HOR2573' for the RGA HORVU2Hr1G126510. One copy was 100% identical to the 'Morex' allele (not shown in the **Figure 5**) and likely represented the orthologous gene whereas the putative 'HOR2573' paralog carried a 4 bp deletion in the first exon, leading to a frame shift and pre-mature stop codon, thus, the HORVU2Hr1G126510 paralog also likely represents a pseudogene (**Figure 5D**).

Although there is no further information on the location of the second copy gene, it is highly expected both NLR and RLK with the highest sequence identity (putative tandem duplications or local paralogs) to be typically found in close physical proximity and are the result of tandem duplication events (Baumgarten et al., 2003; Cantalapiedra et al., 2016).

From these results HORVU2Hr1G126250 was favored as the primary candidate gene for MlLa-H based resistance since it exhibited 4 synonymous and 16 non-synonymous SNPs, leading to amino acid changes in both the LRR and kinase domains in the resistant vs. the susceptible genotypes and was the only of the four candidate genes in 'HOR2573' and/or their respective paralogs that was not modified into a pseudogene.

### DISCUSSION

### Fine Mapping Allowed to Map the MlLa-H Locus in a 850 kb Interval

In this study, we took advantage of improved barley genomic resources and state-of-the-art sequencing-based technologies to

coil domains, respectively. The size of each exon is written above the boxes. The distance between boxes shows the intron size. For the large intron sizes, the distance has been truncated. Premature stop codons are indicated by asterisks with identified position at the bottom of vertical dashed line. The non-synonymous SNPs are indicated by red triangles. Insertions and deletions are depicted with vertical red and yellow lines.

gain insights into a resistance locus called MlLa-H, located distally on barley chromosome 2HL. The locus corresponds to the interval of MlLa, derived from H. laevigatum, a previously reported powdery mildew resistance gene based on hypersensitivity response (Hilbers et al., 1992; Giese et al., 1993 Backes et al., 2003; Marcel et al., 2007). Due to its intermediate reaction type – the MlLa locus has garnered much attention from barley breeders and was immediately introduced into the modern barley varieties 'Minerva' and 'Vada' (Dros, 1957). The current study highlights the use of GBS technology for the construction of a high-density linkage map of the MlLa-H locus placing the locus within an 850-kb region carrying four disease resistance gene analogs. One gene belongs to the RLK and the rest to the NBS-LLR gene family making each of them a potential candidate gene for the MlLa-H locus. The observed physical to genetic distance ratio at the MlLa-H locus (∼1.16 Mb/cM) and physical distance of flanking markers would require screening of an additional 6,000 meioses (based on the formula of Dinka et al., 2007) to provide a chance of observing any additional recombination events between the cluster of the remaining

co-segregating markers and the resistance gene to provide genetic evidence for rejecting several of the found candidate genes. The possibility of resolving the correct candidate gene by genetics and recombination, however, remains a theoretical option, since the candidate genes were observed in a susceptible genotype.

### A Gene Encoding LRR-RLK Protein as the Best Candidate Gene in the MlLa-H Interval

The conducted survey on publicly available gene expression data (IBSC, 2012) of the HORVU2Hr1G126290, HORVU2Hr1G126350, and HORVU2Hr1G126540 clearly showed that these genes were expressed in palea, rachis and developing grains, implying that these genes might not play a role in plant immune responses, in this case in hypertensive response (HR). The re-sequencing analysis of three out of four potential candidate R genes within the MlLa-H interval from 'HOR2573,' the resistant genotype, displayed functional polymorphisms from SNPs to medium and/or large-scale insertions and deletions leading to premature stop codons compared to susceptible parent cv. 'Morex' (**Table 3**). These findings exclude those three genes as candidate genes for the MlLa-H locus, as all these polymorphisms are likely to lead to loss of function of the genes in the resistant genotype. In HORVU2Hr1G126380, predicted to encode an NBS-LRR gene, out-of-frame deletions or insertions were observed in the LRR domain leading to the premature stop codon and a probable loss of function of the domain. LRR domains in R genes have a specific function as site of protein–protein interaction for the recognition of pathogen effectors (Dangl and McDowell, 2006; Ye et al., 2017). Previous studies showed that the LRR domain and its sequence are essential for the recognition of the pathogen, and a mutation in different motifs of LRR domain in R genes could change the gene function either to the partial or complete loss of function of NB-LRR genes (Warren et al., 1998). Gassmann et al. (1999) showed that the transformation of the genomic sequence of Arabidopsis RPS4, a member of NBS-LRR family conferring resistance to Pseudomonas syringae pv. Tomato strain, causing premature stop codons in the LRR domain impeded the function of RPS4. It is then unlikely that a NBS-LRR gene with a severely truncated LRR domain would be a resistance gene. A similar situation has been also observed in HORVU2Hr1G126510, in which the presence of an early premature stop codon occurred in the CC domain, the first functional domain of the protein, making it a pseudogene. In


HORVU2Hr1G126440, a premature stop codon occurred in the NB domain anticipated to cause loss of function as this is usually a highly conserved domain and is involved in signal transduction cascades (Tan and Wu, 2012).

Interestingly, the sequencing results of these four R genes from the resistant parent pointed toward HORVU2Hr1G126250 to be the most likely candidate for the MlLa-H locus. From structural annotation, this gene belongs to the Receptor-Like Serine/Threonine Kinase (RSTK) gene family; meaning that it contains an extracellular region, a single membrane spanning domain and an intracellular kinase domain (Becraft, 2002). The major group of RSTK contains an LRR domain, the extracellular region that is recognized by the repeated sequence LxxLxLxxNxLxx. A typical LRR belongs to the 3, 6, 12, or 24 repeat subfamily of LRR (Kajava, 1998). The structural annotation of HORVU2Hr1G126250 suggested that this gene contains an extracellular LRR domain with six repeats. The resistant parent's genome contains 4 synonymous and 16 non-synonymous SNPs for this gene compared to 'Morex,' leading to amino acid changes in both LRR and kinase domains. Among the four R genes in this cluster, this gene is the only one with meaningful non-synonymous polymorphisms. The study of divergence between ancestral copies of LRR-RLK suggested that some LRR-RLK characterized by fixation of a higher number of non-synonymous than synonymous mutations at some amino acid sites, highlighting the emergence of probably new advantageous functions for these R genes (Dufayard et al., 2017). It has been reported that both LRR and kinase domains are under different selective pressures according to their roles in resistance response. The LRR domain often undergoes a diversifying selection phase, obtaining new advantageous genetic variants, most likely in order to recognize the new virulent pathogen effectors, while the kinase domain is typically under purifying/negative selection leading to the removal of alleles that are deleterious such as functional and structural restrictions involved in signal transduction (Zhang et al., 2006).

Although the comparative sequencing analysis of the putative candidate genes in this target interval provided clear support for the hypothesis of HORVU2Hr1G126250 representing the gene for the MlLa-H, further investigations are required to determine the gene function. This could either rely on (i) transgenic complementation through over-expression of the candidate resistance gene in a susceptible genotype (Ihlow et al., 2008), (ii) by stable or transient RNA interference (RNAi) or gene silencing (TIGS) (Douchkov et al., 2005), or by (iii) RNA guided CAS9 based site-directed mutagenesis or gene-editing (Lawrenson et al., 2015). All three approaches are established routines in barley, thus validation of gene function of MlLa-H candidate genes will be a feasible task.

Even if HORVU2Hr1G126250 is providing a solid candidate gene, it cannot be ruled out that resistance is provided by presence/absence variation (PAV) of a resistance gene between resistant and susceptible genotypes, meaning that the candidate gene might be missing from the 'Morex' genome. Several studies have underlined the high probability of identification of PAV between genotypes with contrasting phenotypes.

Recently, Cantalapiedra et al. (2016) analyzed the exome sequences of three recombinant lines with contrasting resistance phenotypes from a high-resolution mapping population. By narrowing the position of the resistance derived from a Spanish landrace – showing a wide array of powdery mildew isolates – down to a single physical contig, they found large differences between the resistant lines and cultivar Morex as the reference genome, in the form of PAV in the composition of the NBS-LRR cluster. This finding suggested that the functional polymorphism in an R gene locus can occur from PAV of genes. The structural variation might also be constituted by a variable number of homologs in each haplotype which is the most prevalent PAV in multigene loci (Bergelson et al., 1998). The structural comparison of Rpp5, a multigene locus in a downy mildew resistant Arabidopsis ecotype, Landsberg erecta (Ler) with a susceptible ecotype, Columbia (Col-0) revealed the presence of ten Rpp5 homologs in the entire Ler haplotype, whereas Rpp5 haplotype in Col-0 consisted of eight homologs. Bergelson et al. (1998) proposed the Rpp5 locus contained dynamic gene clusters with capability to adapt fast to a new pathogen variant through modification of recognition regions, implying that these regions have been most likely undergone a diversifying and purifying selection (Noël et al., 1999). The structural analysis in both Rpm1 and Rpp5 clearly showed this variation was directly associated with the phenotype. To obtain the complete DNA sequence of the MlLa-H interval from the resistant parent, either developing high-quality de novo assembly from the flow-sorted barley mutant chromosome 2H or performing Targeted Locus Amplification (TLA) approach is highly recommended (de Vree et al., 2014; Thind et al., 2017; Bettgenhaeuser and Krattinger, 2018).

To sum up the current study, the reported high resolution mapping and physical delimitation of the resistance locus MlLa-H represents the fundamental steps for map-based cloning of the respective gene. The fine mapping of the target interval revealed the presence of four disease resistance gene homologs belonging to RLK and NBS-LLR gene families at this locus, which are the potential candidate genes for the race-specific resistance phenotype. The comparative sequencing analysis of these putative candidate genes between resistant and susceptible parents strongly suggested HORVU2Hr1G126250 as being the best candidate gene. To validate this will require additional efforts like mutagenesis or transgene analysis for complementation or knock-out through gene-editing. Compared to quantitative resistance (conferred by several genes with small effects), the race-specific resistance is rather less challenging to incorporate into breeding programs, however, it is often not durable because of rapid changes in the pathogen virulence (Parlevliet, 2002). Combining multiple highly effective R genes, each covering a broad race spectrum, with many known successes is a practical approach to prevent or delay the development of boom-and-bust cycles commonly observed in the deployment of single R genes. The identified co-segregating and closest flanking markers though provide already new possibilities for marker-assisted selection of the MlLa-H locus in barley breeding.

### AUTHOR CONTRIBUTIONS

fpls-10-00146 February 12, 2019 Time: 17:51 # 13

PH performed the experimental work and performed data analysis, and wrote the manuscript. RZ contributed to data analysis. MM supported data analysis. AH conducted GBS sequencing. RN provided molecular marker information regarding the resistance locus derived from Laevigatum', provided all relative information for the Laevigatum' locus. PS provided mapping population and overall supervision of phenotyping. NS designed the study, supervised the experimental work, and contributed to the writing of the manuscript. All authors read, corrected, and approved the manuscript.

### FUNDING

This manuscript is part of a Ph.D. thesis (Hoseinzadeh, 2018) which was financially supported by a grant from the

### REFERENCES


German Research Foundation (DFG) to PS ('DURESTrit,' SCHW 848/3-1) and NS (STE 1102/5-1) in frame of the ERACAPS initiative.

### ACKNOWLEDGMENTS

We gratefully acknowledge the excellent technical support by Susanne Koenig, Manuela Knauft, Manuela Kretschmann, Mary Ziems (IPK). We are grateful to Dr. Guozheng Liu for his contribution and support in statistical analysis.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2019.00146/ full#supplementary-material



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Hoseinzadeh, Zhou, Mascher, Himmelbach, Niks, Schweizer and Stein. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Dissection of Pleiotropic QTL Regions Controlling Wheat Spike Characteristics Under Different Nitrogen Treatments Using Traditional and Conditional QTL Mapping

Edited by: Dragan Perovic, Julius Kühn-Institut, Germany

#### Reviewed by:

Ahmad M. Alqudah, Leibniz-Institut für Pflanzengenetik und Kulturpflanzenforschung (IPK), Germany Kerstin Neumann, Leibniz-Institut für Pflanzengenetik und Kulturpflanzenforschung (IPK), Germany Dejan Bogdan Dejan, Maize Research Institute Zemun Polje, Serbia

#### \*Correspondence:

Tao Wang wangtao@cib.ac.cn Junming Li ljm@sjziam.ac.cn

†These authors have contributed equally to this work

#### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 06 November 2018 Accepted: 05 February 2019 Published: 26 February 2019

#### Citation:

Fan X, Cui F, Ji J, Zhang W, Zhao X, Liu J, Meng D, Tong Y, Wang T and Li J (2019) Dissection of Pleiotropic QTL Regions Controlling Wheat Spike Characteristics Under Different Nitrogen Treatments Using Traditional and Conditional QTL Mapping. Front. Plant Sci. 10:187. doi: 10.3389/fpls.2019.00187 Xiaoli Fan1†, Fa Cui 4†, Jun Ji 2,3, Wei Zhang<sup>2</sup> , Xueqiang Zhao<sup>3</sup> , JiaJia Liu<sup>2</sup> , Deyuan Meng<sup>2</sup> , Yiping Tong<sup>3</sup> , Tao Wang1,5 \* and Junming Li 2,3,5 \*

<sup>1</sup> Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, China, <sup>2</sup> Center for Agricultural Resources Research, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Shijiazhuang, China, <sup>3</sup> State Key Laboratory of Plant Cell and Chromosome Engineering, Chinese Academy of Sciences, Beijing, China, <sup>4</sup> Genetic Improvement Centre of Agricultural and Forest Crops, College of Agriculture, Ludong University, Yantai, China, <sup>5</sup> The Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China

Optimal spike characteristics are critical in improving the sink capacity and yield potential of wheat even in harsh environments. However, the genetic basis of their response to nitrogen deficiency is still unclear. In this study, quantitative trait loci (QTL) for six spike-related traits, including heading date (HD), spike length (SL), spikelet number (SN), spike compactness (SC), fertile spikelet number (FSN), and sterile spikelet number (SSN), were detected under two different nitrogen (N) supplies, based on a high-density genetic linkage map constructed by PCR markers, DArTs, and Affymetrix Wheat 660 K SNP chips. A total of 157 traditional QTLand 54 conditional loci were detected by inclusive composite interval mapping, among which three completely low N-stress induced QTL for SN and FSN (qSn-1A.1, qFsn-1B, and qFsn-7D) were found to maintain the desired spikelet fertility and kernel numbers even under N deficiency through pyramiding elite alleles. Twenty-eight stable QTL showing significant differencet in QTL detection model were found and seven genomic regions (R2D, R4A, R4B, R5A, R7A, R7B, and R7D) clustered by these stable QTL were highlighted. Among them, the effect of R4B on controlling spike characteristics might be contributed from Rht-B1. R7A harboring three major stable QTL (qSn-7A.2, qSc-7A, and qFsn-7A.3) might be one of the valuable candidate regions for further genetic improvement. In addition, the R7A was found to show syntenic with R7B, indicating the possibly exsting homoeologous candidate genes in both regions. The SNP markers involved with the above highlighted regions will eventually facilitate positional cloning or marker-assisted selection for the optimal spike characteristics under various N input conditions.

Keywords: spike characteristics, low nitrogen tolerance, quantitative trait locus, conditional QTL mapping, wheat

### INTRODUCTION

Wheat (Triticum aestivum L.) is one of the leading cereal crops worldwide, and is critical for global food security. The genetic improvement of three yield components, i.e., productive spikes per unit area, kernel number per spike (KN), and thousand kernel weight, contributing to the increase in wheat yield level and the alleviation of the food crisis in recent decades (Sayre et al., 1997; Ma et al., 2007; Deng et al., 2017). Of these three yield components, KN is directly determined by the spike characteristics (Cui et al., 2012). Therefore, researchers have been identifying important genes or quantitative trait loci (QTL) for spike characteristics in order to facilitate high-yield breeding programs (Zhai et al., 2016). For example, Q, C, and S are three major genes affecting spike characteristics and they are located on chromosomes 5A, 2D, and 3D (Johnson et al., 2008; Cui et al., 2012; Faris et al., 2014). In addition to controlling spike characteristics, these three genes present pleiotropy on the other traits. Q regulates the spike length, plant height (PH), and rachis fragility (Simons et al., 2006), whereas S can determine whether a spike has round seeds and glumes (Salina et al., 2000; Zhai et al., 2016). C, which is positioned in the interval of Xgwm484-Xgwm358-Xcfd17-Xgwm539 on chromosomes 2DL (Johnson et al., 2008), accounts for the very dense "club" spike of club wheat (Triticum compactum Host) and has a pleiotropic effect on spike compactness, grain size, shape, and number (Johnson et al., 2008). However, these three genes are no longer the main breeding target for genetic improvement in modern wheat (Zhai et al., 2016), because most common wheat cultivars have identical genotypes, i.e., QcS (Faris et al., 2014). The heading date was another spike development-related trait and crucially affected the spikelet fertility, flower development, and grain filling. Three categories of genes, including vernalizationresponse genes (Vrn), photoperiod-response genes (Ppd), and earliness per-se genes (Eps), were well-known to control heading date (Yan et al., 2006; Cockram et al., 2007; Guedira et al., 2016) and were associated with spike development (Lewis et al., 2008; Zhang et al., 2014; Boden et al., 2015). In addition to these well-studied genes in wheat, there are other genes for spike characteristics being reported in barley, rice and maize, such as MONOCULM 1 (MOC1; Li et al., 2003; Zhang et al., 2015), Sixrowed spike genes (Vrs; Koppolu et al., 2013), CONSTANS-like gene (CO; Mulki and Von, 2016), etc. Among them, CO gene families play an important role in responding to the photoperiod in barley and rice (Griffiths et al., 2003), which affects the floret primordial loss and maximum floret number in wheat (Guo et al., 2016). Vrs is a regulator of spikelet fertility and controls spikelet determinacy in barley (Pourkheirandish et al., 2007; Koppolu et al., 2013). Although the contributions of these wheat orthologous genes on spike characteristics need to be further specified, it could be preliminarily evaluated at the QTL level.

As a complex and multicomponent trait, spike characteristic is comprehensively determined by a series of correlated traits, such as spike length, spikelet number, spike compactness etc. Major genes controlling these components usually present pleiotropy or linkage at the QTL level. Therefore, co-localized QTL regions are noteworthy when the putative candidate genes/loci for improving the integrated sink capacity are being examined. Previous studies have reported that numerous pleiotropic QTL clusters simultaneously affect multiple spike-related traits and involve major genes (Heidari et al., 2011; Cui et al., 2012; Zhai et al., 2016). For example, Rht8 was found to co-localize with QTL for PH as well as QTL for the spike length, spikelet number, and spike compactness on 2DS (Ma et al., 2007; Cui et al., 2012; Xu et al., 2014; Wang et al., 2015; Zhai et al., 2016; Deng et al., 2017), and Rht8c was found to affect spike compactness by regulating the spike length (Kowalski et al., 2016). Rht-B1 on 4BS, the famous green revolution gene, has a pleiotropic effect on not only the PH but also on the kernel weight, grain quality, seedling vigor and adaptability to harsh environments (McCartney et al., 2005; Zheng et al., 2010; Bai et al., 2013; Zhang et al., 2013b; Asif et al., 2015; Cui et al., 2016). However, only Heidari et al. (2011) reported that Rht-B1 co-localized with QTL for spike compactness and its putative effect on other spike characteristics is rarely discussed.

In addition to be influenced by genetic background, spike characteristics are also influenced by many environmental factors, such as nitrogen (N) nutrition. Until now, numerous QTL for spike characteristics have been mapped on all 21 wheat chromosomes (Cui et al., 2012), but only a few studies have explored the QTL response to N application, which is useful for improving N use efficiency and yield potential in harsh nutrition environments. For example, Deng et al. (2017) detected several QTL for spike-related traits possibly induced by N application on chromosomes 2A, 2D, 4B,. 5A, 5B, 6A, and 7B under different levels of N treatments. However, few QTL that are sensitive to N deficiency, which are important for wheat adaption to low N tolerance, have been specified. Conditional QTL analysis is an efficient method to elucidate the influences of environmental factors on QTL expression based on trait values conditioned on different environments (Xu et al., 2014). Particularly, the QTL for drought tolerance in wheat and QTL for salt tolerance in maize were both successfully detected by excluding the influences of traits expressed under normal treatment (Zhang et al., 2013a; Cui et al., 2015).

Based on the high-density genetic linkage map using Wheat 660K SNPs (Cui et al., 2017), this study aimed to: (1) highlight the critical chromosomal regions harboring stable and pleiotropic QTL for spike characteristics including heading date (HD), spike length (SL), total spikelet number (SN), spike compactness (SC), fertile spikelet number (FSN), and sterile spikelet number (SSN), (2) specify the low N-stress induced QTL and identify the influence of N deficiency on the expression of these QTL under N treatments by traditional QTL analysis and conditional QTL analysis, and (3) preliminarily illuminate the possible pleiotropic effect of Rht-B1 on spike characteristics.

**Abbreviations:** SN, spikelet number; FSN, fertile spikelet number; SSN, sterile spikelet number; SC, spike compactness; SL, spike length; HD, heading date; KN, kernel number per spike; PH, plant height; KJ-RILs, recombinant inbred line population derived from the cross between Kenong9204 and Jing411; N, nitrogen; LN, low nitrogen; HN, high nitrogen; QTL, quantitative trait loci.

### MATERIALS AND METHODS

### Experimental Materials and Evaluation

A recombinant inbred line (RIL) population comprising 188 lines derived from a cross between Kenong 9204 (KN9204) and Jing 411 (J411) (represented by KJ-RILs) were used in this study (Fan et al., 2015; Cui et al., 2016, 2017; Zhang et al., 2017). The KJ-RILs and their parents were evaluated in four trials (year × location) as follows: 2011–2012 in Shijiazhuang (Trial 1: 37◦ 53′N, 114◦ 41′E, altitude 54 m); 2012–2013 in Shijiazhuang (Trial 2); 2012–2013 in Beijing (Trial 3: 40◦ 06′N, 116◦ 24′E, altitude 41 m); 2013–2014 in Shijiazhuang (Trial 4). In each trial, a low nitrogen (LN) treatment and high nitrogen (HN) treatment were applied for a total of eight environments, which were designated as T1LN, T1HN, T2LN, T2HN, T3LN, T3HN, T4LN, and T4HN. In the high N plots, N was applied as diamine phosphate at 180 kg•ha−<sup>1</sup> before sowing and 225 kg•ha−<sup>1</sup> of urea was applied at the elongation stage. In the LN plots, no N fertilizer (N-deficient) was applied during the growing period. The soil nitrogen contents at sowing in 0–40 cm soil depth in each plot were analyzed and shown in **Table S1**. The materials were planted in randomized complete blocks with two replications for each of the 8 environments. Each block contained two rows, 2 m long and 0.25 m apart and 40 seeds were evenly planted in each row. All nitrogen treatments, field arrangements, and experimental designs were previously described in detail previously (Fan et al., 2015; Zhang et al., 2017).

Six spike-related traits including HD, SL, SN, SC, SN, and SSN were evaluated in this study. HD was recorded as the day of the year in all environments when 50% of the plants in a plot were at Zadoks 59 growth stage (Guedira et al., 2016). At maturity, SL was measured from the base of the rachis to the tip of the terminal spikelet, excluding the awns. FSN and SSN were determined by counting the number of fertile and sterile spikelet per spike. SN was calculated by summing the values of FSN and SSN. SC was calculated by dividing the SL by SN. For each row, the main tillers of five plants were randomly chosen from the middle of the row to measure the phenotype.

### Data Analysis and QTL Mapping

The analysis of variance (ANOVA) and the phenotypic correlation coefficients were performed using SPSS 19.0 (SPSS, Chicago, MI, USA). The broad-sense heritability (h 2 B ) was calculated using QGAStation 2.0 (http://ibi.zju.edu.cn/software/ qga/v2.0/index\_c.htm); the eight environments were regarded as replications and the genotype × environment interaction as the error term (Xu et al., 2014). Conditional analysis was performed to study the effects of LN-stress on QTL expression. All conditional phenotype value data were collected according to Zhu (1995) and Xu et al. (2014) using QGAStation 2.0. The conditional phenotype values (LN|HN) are the net genetic variation of trait values in LN independent of that in HN (Xu et al., 2014; Cui et al., 2015). Both the measured and the conditional phenotype values were used for QTL analysis which were designated as traditional QTL analysis and conditional QTL analysis, respectively.

Wheat660K, the Affymetrix <sup>R</sup> Axiom <sup>R</sup> Wheat660, was designed by the Chinese Academy of Agricultural Sciences and synthesized by Affymetrix. It is genome-specific array with high density and is highly efficient in a wide range of potential applications (http://wheat.pw.usda.gov/ggpages/topics/ Wheat660\_SNP\_array\_developed\_by\_CAAS.pdf). The highdensity genetic map of KJ-RILs used to detect the QTL in this study was constructed by 119,001 SNP markers derived from the Wheat 660 K and 565 SSR, EST-SSR, ISSR, STS, SRAP, and DArT markers (Cui et al., 2017). This origin map contained a total of 119,566 loci. These 119,566 loci had 4,959 patterns of segregation in the 188 KJ-RILs, and 4,959 markers were chosen to represent each bin and were used for QTL mapping in this study. The present map has an average density of one marker per 0.89 cm and spans 4424.40 cm across 21 chromosomes. More information on this map is described by Cui et al. (2017).

PLN and PHN present the adjusted mean phenotypic value across four trials under the LN (T1LN, T2LN, T3LN, and T4LN) and HN (T1HN, T2HN, T3HN, and T4HN) treatments, respectively. The traditional phenotype value of T1LN, T1HN, T2LN, T2HN, T3LN, T3HN, T4LN, T4HN, PLN, and PHN, and the conditional phenotype values (LN|HN) in T1, T2, T3, T4, and P (the adjusted mean conditional phenotypic value (LN|HN) across four trials) were used to detect QTL by inclusive composite interval mapping (ICIM) performed by IciMapping 4.1 (Li et al., 2007; freely download from http://www.isbreeding. net/). The walking speed chosen for all QTL was 1.0 cm, and the P-value inclusion threshold was 0.001. The LOD scores of 3.0 were used to detect and declare the presence of a putative QTL (Zhang et al., 2017). Furthermore, a QTL with an LOD value >5.0 and a phenotypic variance contribution >10% (on average) was defined as the major QTL; a QTL with an LOD value >3.0 but <5.0 and a phenotypic variance contribution <10% (on average) was defined as a moderate QTL; a QTL detected only under HN or LN conditions was defined as the HN- or LN- specific QTL (Fan et al., 2015; Zhang et al., 2017).

Moreover, except for the single-environment QTL detection based on ICIM, multi-environment QTL analysis was conducted using GenStat 19.0 across all environments to verify the traditional QTL identified in the individual environment and to evaluate the QTL × environment interaction effects (Payne et al., 2013). A best variance-covariance model was selected based on the Schwarz Information Criterion for phenotype data from a set of multi-environment experiments (Schwarz, 1978; Malosetti et al., 2013). The QTL and QTL × environment interaction effects were determined by testing the significance of environment-specific deviations from the main environmental effects using a Wald test (Verbeke and Molenberghs, 2000). The degree of phenotypic variation explained by an individual QTL was calculated as described by Asfaw et al. (2012). Finally, a QTL that could be detected by both single-environment and multi–environment QTL detection was defined as a stable QTL, considering its significant stability even based on different detection models. For a given trait, QTL and loci with overlapping CIs (LOD ≥2) were assumed identical (Zhang et al., 2017). Among different traits, QTL sharing flanking markers were considered to be a "cosegregation" QTL region (Cui et al., 2016).

The reported spike related genes/loci and their genetic positions were obtained in the literature; these included, including Photoperiod-1 (Ppd-A1, Ppd-B1, and Ppd-D1) (Boden et al., 2015), the vernalization responsive genes (Vrn-1A, Vrn-1B, Vrn-1D, Vrn2, and Vrn3) (Yan et al., 2004, 2006; Dubcovsky et al., 2006), the earliness per se locus Eps-Am1 (Lewis et al., 2008), the spike-compacting related locus (C) (Johnson et al., 2008), the floral organ development related gene TaANT (Zhang et al., 2014), the semi-dwarfing genes Rht-B1 and Rht8 (Ellis et al., 2007; Zhang et al., 2013b, 2017), and the wheat orthologous genes corresponding to Six-rowed spike 1 (Vrs1) (Komatsuda and Tanno, 2004), CONSTANS, and MOC1 (Campoli et al., 2012). The Chinese Spring genome assembly from the International Wheat Genome Sequencing Consortium (IWGSC) Reference Sequence v1.0 was used as the reference genome. The available sequences of these reported spike-related genes/loci were retrieved in the Gene Bank through the NCBI website (http://www.ncbi.nlm.nih.gov/) and used as queries in a BLAST search in the IWGSC website (https://urgi.versailles. inra.fr/blast\_iwgsc/blast.php) to obtain their physical position. By comparing their position with the highlighted regions in this study, the possible candidate genes were preliminarily screened.

### RESULTS

### Phenotypic Data and Correlation Analyses

The genetic variation of six investigated traits in the KJ-RILs is shown in **Table S2**, which presented the ANOVA results for the phenotypic data. ANOVA showed that genotype, environment and genotype × environment had significant effects on HD, SL, SN, SC, FSN, and SSN. KN9204 has an earlier HD, shorter SL, fewer SN and FSN, more SSN, and tighter SC than those of J411 (**Table S3**). In the KJ-RIL population, the six traits exhibited approximately continuous variation in each treatment in four trials. Transgressive segregation was observed in both high and low sides in this population (**Table S3**), indicating that alleles with positive effects were contributed by both parents. Additionally, the absolute values of skewness and kurtosis were almost <1 (**Table S3**), indicating that the phenotypic data were approximately normally distributed in this population. Heritability ranged from 50.30 to 77.06% (**Table S3**). A significant positive correlation was observed between SN and the other investigated traits (**Table S4**). The SL was significantly negatively correlated with SC, whereas it was positively correlated with the other examined traits (**Table S4**). FSN was significantly negatively correlated with SSN while positively correlated with the other investigated traits (**Table S4**). The most significant correlation coefficient was observed between SL and SN (0.888) (**Table S4**).

### Traditional QTL Analysis Under Different Nitrogen Treatments

For single-environment QTL detection, a total of 157 QTL for the six spike characteristics (**Table S5**), were detected by IciMapping 4.1, distributed on all 21 chromosomes with QTL phenotypic variations ranging from 1.55 to 26.26% and LOD value of 3.01– 22.22. Among these 157 QTL, 41 could be detected in different trials, and 11 QTL were major QTL (**Table S5**).

Additionally, for multi-environment QTL detection, 30 loci were identified using GenStat 19.0 (**Table S6**). Twenty-eight loci show significant interaction effect with environment (**Table S6**). The The CIs of 28 loci overlapped with the corresponding singleenvironment QTL listed in **Table 1**. Thus, these 28 QTL were considered stable QTL, and 78.57% of them (22 QTL) could be detected in different trials based on ICIM (**Table 1**). It is notable that all the 11 major QTL could be repeatedly detected by GenStat 19.0 and were stable QTL (**Table 1**, **Tables S5**, **S6**).

### The Traditional QTL for HD

Twenty-five traditional QTL for HD were identified by singleenvironment QTL detection, and eight and 12 of them were LN- and HN-specific QTL, respectively (**Table S5**). qHd-1B.1, qHd-2D.1, and qHd-4B.1 could be repeatedly detected by multienvironment QTL detection (**Table S6**) and thus were stable QTL (**Table 1**). Additionally, qHd-2D.1 and qHd-4B.1 were major and stable QTL (defined as major stable QTL) detected under nine datasets except T1LN, explaining 8.58–27.01% and 3.36–22.29% of the HD variation, respectively. KN9204-derived alleles could advance the HD at the locus of qHd-2D.1 while delay HD at the locus of Hd-4B.1 (**Table S5**).

### The Traditional QTL for SL

A total of 26 QTL associated with SL were detected based on ICIM, thirteen and six of them were LN- and HN-specific QTL, respectively (**Table S5**). qSl-2D.1, qSl-5A.3, and qSl-7A.2 were three stable QTL through verifying by GenStat software (**Table 1**, **Table S6**). The major QTL on 2D (qSl-2D.1) was significant in T1LN, T1HN, T3HN, and PHN, explaining 7.99– 16.58% of the phenotypic variation. The other major QTL on 5A (qSl-5A.3) were detected in ten datasets and explained 13.09–30.43% of the SL variation. The additive effects of the two major QTL showed that the negative alleles (shortening SL) originated from the parent with the shorter SL, ie., KN9204 (**Table S5**).

### The Traditional QTL for SN

For single-envrionment QTL detection, thirty-one QTL were identified, and among them, nine and 14 were LN- and HNspecific QTL (**Table S5**), respectively. Six QTL (qSn-1A.1, qSn-4A, qSn-5A.2, qSn-7A.2, qSn-7B.2, and qSn-7D) show overlap of the CIs with the loci detected by GenStat software (**Table S6**) and were stable QTL (**Table 1**). The only major QTL on 7A (qSn-7A.2) was observed under both LN and HN treatments in all trials, which had LOD value of 4.99–39.13 and explained 9.80–43.37 % of the SN variation, with KN9204-derived allele decreasing SN (**Table S5**).

### The Traditional QTL for SC

Twenty-one QTL for SC were detected in individual environments based on ICIM method, eleven and five of which were LN- and HN-specific QTL, respectively (**Table S5**). Five stable QTL (qSc-2B.1, qSc-2D, qSc-5A.2, qSc-7A, and


The positive value of additive effect indicates that the KN9204 allele increases the corresponding traits. The negative value indicates that the J411 allele increases the corresponding traits. <sup>a</sup>The QTL in bold are the major QTL. The underlined QTL are the QTL which could be detected in different trials by IciMapping 4.1.

qSc-7B) were significant by multi-environment QTL detection (**Table 1**, **Table S6**). Among them, qSc-5A.2 and qSc-7A were two major QTL explaining 13.12–28.38% and 4.87–17.22%of the SC variation, with the LOD value of 7.37–20.40 and 3.67–15.04, respectively, in all datasets. KN9204 conferred an effect for an increased SC at the former locus but a decreased SC at the latter one. Additionally, a moderate but stable QTL (defined as moderate stable QTL) on 2B (qSc-2B.1) could be repeatedly detected in nine datasets, except for T2LN (**Table S5**).

### The Traditional QTL for FSN

Twenty-eight QTL for FSN were mapped by single-environment QTL detection. Twelve and eight of them were LN- and HNspecific QTL, respectively (**Table S5**). Six stable QTL, including qFsn-1B.1, qFsn-3A.1, qFsn-4A, qFsn-4B.2, qFsn-7A.3, and qFsn-7D were repeatedly identified by multi-environment QTL detection (**Table 1**, **Table S6**). qFsn-4B.2 was a major stable QTL which had LOD value of 5.13–18.36 and explained 6.30– 24.43% of the phenotypic variation in T1LN, T1HN, T3LN, T4LN, T4HN, PLN, and PHN. qFsn-7A.3 was the other major stable QTL expressed in nine datasets except T4HN, with LOD value of 4.61–36.38 and PVE of 9.11–36.41%. An additive effect of qFsn-4B.2 showed that the superior allele originated from KN9204, while the superior allele of qFsn-7A.3 came from J411. Additionally, qFsn-3A.1 was found to be a moderate stable QTL and was expressed in seven datasets, except for T2LN, T3LN, and PLN (**Table S5**).

### The Traditional QTL for SSN

In single-envrionment QTL detection, twenty-six QTL for SSN were identified, nine and 9 of which were LN- and HN-specific QTL, respectively (**Table S5**). qSsn-1B.1, qSsn-4B.3, qSsn-5D.1, qSsn-6B, and qSsn-7D.2 were five stable QTL, as verified by GenStat software (**Table 1**, **Table S6**). Two major stable QTL (qSsn-4B.3 and qSsn-5D.1), show significance in five and ten datasets, respectively. qSsn-4B.3 and qSsn-5D.1 explained 6.80– 11.38% and 9.14–17.01% of the phenotypic variation, with LOD values of 5.25–9.70 and 6.16–12.03, respectively. A moderately stable QTL expressed in all the ten datasets was detected on 7D (qSsn-7D.2) (**Table S5**).

### Conditional QTL Analysis With Respect to LN-Stess-Inducible QTL

By comparing the additive effects of the traditional QTL detection (**Table S5**) and conditional QTL detection based on the trait values of LN conditioned on that of HN (LN|HN) (**Table S7**), the effects of LN-stress on QTL expression for spike characteristics could be evaluated (Xu et al., 2014; Cui et al., 2015). A total of 54 loci were significant in the conditional QTL analysis using IciMapping 4.1 (**Table S7**). They explained 3.17– 14.70% of the phenotypic variation and showed LOD values of 3.00–8.02. Only five loci associated with SN, SC, and FSN had main effects (average LOD > 5 and average PVE > 10 %), and four loci for SN, SC, and FSN were repeadly detected in different trials (**Table S7**). Among the conditional loci, 21 were also identified in traditional QTL analysis (**Table S5**) and listed in **Table 2**, while the other 33 were newly detected (**Table S7**).

By comparing the additive effects of traditional analysis and conditional analysis based on trait values of LN conditioned on that of HN, the effects of N deficiency on the QTL expression of related traits could be evaluated. For example, if a conditional locus conditioned on HN has a similar or greatly different effect to its traditional QTL, demonstrating that the QTL is completely or partially contributed by LN stress, whereas when an traditional QTL cannot be detected again when conditioned on HN, the QTL is considered to mainly controlled by N supplementation. In detail, one, four, two, five, five, and four traditional QTL for HD, SL, SN, SC, FSN, and SSN could also be detected by conditional QTL analysis, respectively (**Table 2**, **Tables S5**, **S7**), and fourteen QTL could also be detected by GenStat software (**Table S6**). By comparing the additive effects of traditional QTL mapping results, three and eighteen corresponding conditional loci individually showed similar or greatly different effect values, implying that their expression were completely or partially affected by LN-stress, respectively. Among them, three loci (corresponding to qSn-1A.1, qFsn-1B.1, and qFsn-7D) were detected only under LN treatment by traditional QTL analysis and showed a significant interaction with environment, and all three loci were found to have the similar additive affects using both conditional and traditional QTL mapping (**Table 2**); thus, they were considered to be the completely LN-stressinduced QTL. Furthermore, qFsn-1B.1 and qFsn-7D were stable QTL (**Tables 1**, **2**).

### Stable QTL Regions for Spike Characteristics

Based on the stable QTL detected in this study (**Table 1**), seven genomic regions containing 19 stable QTL for different traits were highlighted (**Table 3**; **Figure 1**). These clustered stable QTL shared confidence intervals and thus were indicative of potential pleiotropic effect on the corresponding traits. Among them, R7D harbored KN9204-derived alleles for simultaneously increasing the corresponding traits, while four regions (R2D, R4A, R7A, and R7B) harbored the J411-derived alleles that contributed to the increasing alleles. Four regions (R2D, R4B, R5A, and R7A) contained major stable QTL (**Table 3**; **Figure 1**).

The approximate physical position was searched by the flanking bin marker as a BLAST query. By comparing the position of the reported genes/QTL controlling the spike characteristics, the putative corresponding genes, and coincident QTL were screened. Until now, the candidate region of the spike-compacting locus C (Johnson et al., 2008) was found to cover R2D (**Table 3**; **Figure 1**). Furthermore, in R7B, the wheat orthologous of the barley gene CONSTANS (Campoli et al., 2012) was identified (**Table 3**; **Figure 1**). R4B, harboring two major stable QTL (qHd-4B.1 and qFsn-4B.2) and a moderate stable QTL (qSsn-4B.3), just covered the semi-dwarfing gene Rht-B1 (**Table 3**; **Figure 1**), indicating that Rht-B1 might have a pleiotropic effect on the spike characteristics.

### Validation of the Consequences of Rht-B1

To preliminarily confer the pleiotropic effect of Rht-B1 on the spike characteristics, genotyping of the diagnostic marker for the Rht-B1 alleles (Rht-B1a and Rht-B1b) (Zhang et al., 2017) was used to group the KJ-RILs. Of the 188 KJ-RILs, 87, and 95 RILs were consistent with the genotypes of the alleles from the shorter parent KN9204 (with Rht-B1b) and taller parent J411 (with Rht-B1a), respectively (**Table 4**). The average phenotypic value under both the LN and HN treatments were used to identify differences in corresponding traits between Rht-B1b and Rht-B1a. The results indicated that, regardless of N treatments, the Rht-B1b was associated with the significant reduction in PH and SSN, and a remarkable increase in KN, HD, SN, and FSN, but no significant effect was detectedin the SL and SC (**Table 4**), which was coincident with the QTL mapping results (**Table 3**, **Table S5**).

### DISCUSSION

## QTL for Spike Characteristics Show Sensitivity to Nitrogen Supply

N is an important environmental factor determining spike and yield formation. Spike development was positively correlated with the N fertilizer application, which could optimize the kernel number per spike with increased spikelet number, spike N


TABLE 2 | The loci for spike characteristics significant using both traditional and conditional QTL mapping.

The positive value of the additive effect indicates that the KN9204 allele increases the corresponding traits. The negative value indicates that the J411 allele increases the corresponding traits. <sup>a</sup>The underlined loci indicate that they can be repeatedly detected in different trials by conditional QTL analysis; the locus in bold indicates it is with main effect. <sup>b</sup>The underlined QTL were QTL that were detected in different datasets based on the ICIM method. \* Indicates that the QTL are completely LN-stress induced QTL in which is the absolute values reduces or increases <10% compared to the corresponding traditional QTL, respectively (Fan et al., 2015).


<sup>a</sup>The QTL in bold are the major QTL. The underlined QTL are the QTL which could be repeatedly detected in different trials by IciMapping 4.1. The number in the parentheses indicate the sum of single-environment datasets in which the corresponding QTL are significant. The "+" in the parentheses indicates that KN9204 allele increases the corresponding traits. The "–" in the parenthesis indicates that J411 allele increases the corresponding traits.

represent the coincident QTL in previous reports (Reference 1 to Reference 11), and among them, three QTL in bold (qKnps-4A, qPh-4B, and qKn-4B) were detected using the same population of this study (KJ-RILs). On the right of each chromosomal region: the blue and green rectangles represent the QTL with positive alleles from KN9204 and J411, respectively; the letter C after parentheses represents the expression of QTL induced by N deficiency, and among them, C\* represents the QTL (qFsn-7D) is a completely LN-stress induced QTL. Reference 1: (Nishijima et al., 2017); Reference 2: (Cuthbert et al., 2008); Reference 3: (Cui et al., 2017); Reference 4: (Heidari et al., 2011); Reference 5: (Zhang et al., 2017); Reference 6: (Fan et al., 2015); Reference 7: (Zhai et al., 2016); Reference 8: (Xu et al., 2014).


TABLE 4 | Validation of the putative pleiotropy of Rht-B1b on spike characteristics and kernel number.

The significant difference noted indicates that significant differences were detected between two genotypes containing Rht-B1b and Rht-B1a, respectively, under the LN and HN treatments.

\* Indicates significance at the level of 0.05.

\*\*Indicates significance at the level of 0.01.

\*\*\*Indicates significance at the level of 0.001.

content accumulation and spikelet fertility and, ultimately, affect the yield (Corke and Atsmon, 1988; Demotes-Mainard et al., 1999). Thus, exploring the genetic basis of spike formation under different N conditions could excavate the useful nitrogen response loci to high nitrogen-use-efficiency and high yield breeding. The QTL identified under a specific nitrogen treatment are probably involved in the adaptation to nitrogen fertilizer management (Laperche et al., 2007). In this study, of the 157 detected QTL, 73.89% (62 LN-specific QTL and 54 HN-specific QTL) were N-supply-specific QTL (**Table S5**), thereby suggesting that the genetic basis of the spike characteristics was sensitive to the N treatment and thus could provide an appropriate and timely response strategy when adapting to variable N fertilizer level in the field, possibly through motivating different genetic regulatory network. However, we noticed only 9 N-supplyspecific QTL (6 LN-specific QTL and 3 HN-specific QTL) stably show significancy in different trials, and none of them could express phenotypic variation >10% (**Table S5**), indicating that the regulatory network for N supply might be synergistically controlled by multiple moderately inducible genes. Therefore, considering the common sensitivity, pyramiding multiple loci, especially the stable loci, might be an efficient copying strategy to specific N supply conditions.

In addition to the N-supply-specific QTLs, eleven major stable QTLs could be identified under both the LN and HN treatments (**Table S5**). This result confirmed that the expression of these QTLs was stable and more unsusceptible to nitrogen supply, suggesting that their close linkage markers are of value in selecting and breeding optimal spike characteristics, regardless of nitrogen constraints.

### LN-Stress-Induced Loci for Spike Characteristics Were Efficiently Identified When Conditioned on HN

A boost in crop productivity was observed in recent decades through the global use of nitrogen fertilizers. However, the consequent nitrogen pollution prompted modern breeders to prefer to develop LN-tolerant cultivars adapted to environmentally friendly, low-input agricultural systems. The interactive relationship between N application and spike development was demonstrated by QTL mapping in previous studies (Xu et al., 2014; Deng et al., 2017). However, few studies specifically examined the influence of LN-stress on the expression of QTL for spike characteristics. These LN-stress induced QTL might be more valuable for improve the ability to maintain desired spike type in LN input agricultural practices. In this study, to further unravel loci sensitive to N deficiency and specify their responsiveness to LN stress, conditional QTL analysis was conducted and 54 LN-stress induced loci were detected (**Table S7**). Of them, 33 loci (61.11 %) could only be detected when conditioned on HN, indicating that most LN-stress induced loci were suppressed by N application (**Table S7**), which is consist with the previous conclusion (Zhang et al., 2013a; Cui et al., 2015). The different responsiveness of these LN-induced loci was also evaluated by comparing their additive effect detected between traditional and conditional QTL mapping (Xu et al., 2014). As a result, three completely LN stress induced QTL for SN and FSN (qSn-1A.1, qFsn-1B.1, and qFsn-7D) (**Table 2**) were considered critical for LN tolerance. Pyramiding the elite alleles of the above three QTL might be an optimal approach in wheat molecular breeding programs to acquire desired spikelet number and spikelet fertility under N deficiency. To verify this deduction, the SN, FSN, and KN was further investigated by pyramiding effect analysis. The results (**Table S8**) demonstrated that SN and FSN were significantly higher in the Type 1 genotype (which contained all three elite alleles) than in the Type 8 genotype (which contained none of the elite allele) under both LN and HN conditions and that KN was also increased in the favorite genotype (Type 1). Interestingly, all differences between HN and LN (HN-LN) in SN, FSN, and KN were remarkably decreased through pyramiding of the three elite alleles (Type 1), indicating that these three completely LN-induced QTL provided certain genetic support to maintain yield potential, and thus their linked marker could be used in breeding LN tolerant wheat. In addition, qSn-7A.2, a phosphorus (P)-contributed QTL detected by Xu et al. (2014), was found to be partially induced by N deficiency in different trials in this study (**Table 2**), suggesting that the antagonistic effect of N and P on its expression could be futher dissected.

### Comparison of the Present Findings With Previous Studies

The optimization of multiple spike characteristics can efficiently enhance the integrated sink capacity and yield potential. Nineteen stable QTL were clustered into seven pleiotropic genomic regions (**Table 3**; **Figure 1**). According to the common PCR-markers and the physical position of the SNP markers, five regions on chromosomes 2DL, 4BS, 5AL, 7AL, and 7BL were involved with the coincident QTL that were previously reported. For example, the CI of R2D was distal to Ppd-D1 and Rht8 were genes on chromosome 2D that were two well known to control HD and SC (Boden et al., 2015; Kowalski et al., 2016). However, the wheat orthologous gene of HvVrs1, which determined spike morphology (Komatsuda and Tanno, 2004), was located nearby R2D (around 85.13 cm). And the CI of the C locus, which was previously mapped in the interval of Xgwm484-Xgwm539 by Johnson et al. (2008), also covered the R2D in this map. Thus, C and Vrs1 have greater possibility of contributing to qSc-2D and qSsn-2D in R2D, rather than Ppd-D1 and Rht8. Considering that the major stable QTL qHd-2D.2 could also express in different genetic backgrounds (Li et al., 2002; Cuthbert et al., 2008), R2D, which affect not only the spike morphology but also the heading date, was deduced to harbor pleiotropic or clustered genes, probably involved with C and Vrs, to some extent supporting the significant correlation between HD and most other spike characteristics in this study (**Table S4**).

In R5A (**Table 3**, **Figure 1**), the major QTL qSl-5A.3 and its colocalized major QTL for SC (qSc-5A.2), the J411-derived alleles could increase SL but decrease SC with remarkable stability in both the KN9204/J411 and Y8679/J411 RIL population (Zhai et al., 2016), possibly providing the critical stable genetic basis and thereby accounting for the significant negative correlation between SL and SC (**Table S4**). Moreover, the CI of R5A was far from Vrn-A1, which is in accordance with the mapping results by Zhai et al. (2016).

qSn-7A.2 and qFsn-7A.3 in R7A (**Table 3**; **Figure 1**) were frequently reported to be associated with SN (Xu et al., 2014; Zhai et al., 2016; Giunta et al., 2018) and FSN (Liu et al., 2014; Zhai et al., 2016). However, the robust stable QTL controlling SC (qSc-7A) in R7A are presented for the first time in this study (**Table 1**). Spikelet compactness is a signature trait distinguishing club wheat from common wheat. The genes related to the clubbed head are useful in exploration of dissecting the relationship between spike morphology and sink capacity (Jantasuriyarat et al., 2004; Johnson et al., 2008). The discovery of stable QTL in R7A with strong and pleiotropic effect on SC, SN, and FSN might be valuable for facilitating spike morphology optimization in the breeding process. Because of the allopolyploid feature of common wheat, many major genes have homoeologues in the syntenic regions of the same homoeologous group (Khlestkina et al., 2008), such as dwarfing gene Rht-B1 and Rht-D1 localizing on 4BS and 4DS (Börner et al., 1997), respectively. In this paper, sixty-five genes in R7A were also found homoeologous genes in R7B whose CI nearby the orthologous to CONSTANS in barley (**Table S9**; **Figures 1**, **2**), in consistence with the previous study (Giunta et al., 2018). This colinearity was possibly responsible for pairwise major homoeologous genes for spike characteristics on chromosomes 7A and 7B. The possibility that CONSTANS was the candidate gene involveing with R7B and R7A need further investigated.

Because no common markers or coincident QTL were found in R4A and R7D, these regions were regarded as having novel QTL controlling spike characteristics in this study (**Table 3**; **Figure 1**). In R4A, a major stable QTL for KN was previously detected by Cui et al. (2017) using the KJ-RIL population. J411

FIGURE 2 | The syntenic region between R7A and R7B. The number in parentheses following the SNP marker (e.g., AX-110483616) indicates the genetic position (cm); the number in the parentheses following predicted gene (e.g., TraesCS7A01G473100) indicates the physical position (Mb); the red shadow indicates the range of the syntenic region between R7A and R7B.

conferred the increased effect on HD, SN, FSN, and KN in R4A, indicating that the corresponding genes in this region might improve KN through modifying spike characteristics; thus, this pleiotropic region was important for high-yield breeding.

### Putative Pleiotropic Effect of Rht-B1 on Spike Characteristics, Plant Type, and Kernel Formation

Rht-B1 has commonly been reported to map to the QTL for yield parameters and grain quality (McCartney et al., 2005; Zhang et al., 2013b; Cui et al., 2016). However, except for Heidari et al. (2011), who located major QTL for spikelets/spike and SC close to Rht-B1, few reports specify its contribution to spike characteristics. Previously, using the same KJ-RIL population, an expected major stable QTL for PH (Zhang et al., 2017) and a major stable QTL for KN (Fan et al., 2015) were found to link with the Rht-B1 locus. This study provides further preliminary confirmation of the putative pleiotropic effect of Rht-B1 on spike characteristics (**Table 4**), which was consistent with the QTL mapping results (**Table 3**). This result revealed that Rht-B1b conferred decreased PH and SSN and contributed to the increased HD, SN, FSN, and KN, which were little affected by N application. This result indicated that Rht-B1 might affect yield potential by controlling the plant type as well as sink capacity. However, Rht-B1 was found to have a strong correlation with TKW but little association with KN when the Kauz/Westonia DH population by Zhang et al. (2013b). The inconsistency might have resulted from the use of different mapping populations and therefore deserves for further investigation.

### CONCLUSION

In this study, 157 and 54 spike-related loci were identified by traditional and conditional QTL mapping, respectively, based on ICIM. Among them, qSn-1A.1, qFsn-1B.1, and qFsn-7D were QTL completely induced by LN stress and their positive pyramiding effect on increasing KN was verified. Seven genomic regions were highlighted because they harbored stable QTL which could be detected by different detection models. Among

### REFERENCES


them, R2D, R4B, R5A, and R7A harbored major stable QTL. R4A and R7D might contain novel QTL for spike characteristics. R7A and R7B show synteny in their candidate regions, implying that the homoeologous genes for spike characteristics possibly exist in R7A and R7B. Additionally, R2D, R4B, and R7B were found to be involved with C, Rht-B1, and wheat orthologous gene of CONBSTAN, respectively. Rht-B1b was validated to contribute to a significant reduction of PH and SSN but contributed to an increase of KN, HD, SN, and FSN, which explained the observed pleiotropy of R4B well. These LN-input sensitive loci and highlighted regions for spike-related traits could be helpful for improving the wheat sink capacity and yield potential.

## AUTHOR CONTRIBUTIONS

XF, FC, and JL designed the research. XF and FC conducted genotyping of the KJ-RIL population. XF, FC, JJ, WZ, JL, DM, YT, TW, and JL conducted phenotyping of the KJ-RIL population. XF analyzed data and wrote the paper. TW and JL had primary responsibility for final content. All authors read and approved the final manuscript.

### ACKNOWLEDGMENTS

This research was supported by grants from the National Natural Science Foundation of China (31601809), the Science and Technology Service Network Initiative of the Chinese Academy of Sciences (KFJ-STS-ZDTP-024), Hebei Provincial Science and Technology Research and Development Project (16226320D), and China Agriculture Research System (CARS-03). Special thanks to Dr. Hui Gao for providing all fall and temperature data in Shijiazhuang and all the people who helped collect the data or manage the felds.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2019. 00187/full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Fan, Cui, Ji, Zhang, Zhao, Liu, Meng, Tong, Wang and Li. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Multiple Near-Isogenic Lines Targeting a QTL Hotspot of Drought Tolerance Showed Contrasting Performance Under Post-anthesis Water Stress

#### Md Sultan Mia1,2,3, Hui Liu1,2, Xingyi Wang1,2 and Guijun Yan1,2 \*

<sup>1</sup> School of Agriculture and Environment, Faculty of Science, The University of Western Australia, Perth, WA, Australia, <sup>2</sup> The UWA Institute of Agriculture, The University of Western Australia, Perth, WA, Australia, <sup>3</sup> Plant Breeding Division, Bangladesh Agricultural Research Institute, Gazipur, Bangladesh

#### Edited by:

Dragan Perovic, Julius Kühn-Institut, Germany

#### Reviewed by:

Ildikó Karsai, Centre for Agricultural Research (MTA), Hungary Yerlan Turuspekov, Institute of Biology and Plant Biotechnology, Kazakhstan

> \*Correspondence: Guijun Yan guijun.yan@uwa.edu.au

#### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 22 August 2018 Accepted: 19 February 2019 Published: 08 March 2019

#### Citation:

Mia MS, Liu H, Wang X and Yan G (2019) Multiple Near-Isogenic Lines Targeting a QTL Hotspot of Drought Tolerance Showed Contrasting Performance Under Post-anthesis Water Stress. Front. Plant Sci. 10:271. doi: 10.3389/fpls.2019.00271 The complex quantitative nature of drought-related traits is a major constraint to breed tolerant wheat varieties. Pairs of near-isogenic lines (NILs) with a common genetic background but differing in a particular locus could turn quantitative traits into a Mendelian factor facilitating our understanding of genotype and phenotype interactions. In this study, we report our fast track development and evaluation of NILs from C306 × Dharwar Dry targeting a wheat 4BS QTL hotspot in C306, which confers drought tolerance following the heterogeneous inbreed family (HIF) analysis coupled with immature embryo culture-based fast generation technique. Molecular marker screening and phenotyping for grain yield and related traits under post-anthesis water stress (WS) confirmed four isoline pairs, viz., qDSI.4B.1-2, qDSI.4B.1-3, qDSI.4B.1-6, and qDSI.4B.1-8. There were significant contrasts of responses between the NILs with C306 QTL (+NILs) and the NILs without C306 QTL (−NILs). Among the four confirmed NIL pairs, mean grain yield per plant of the +NILs and −NILs showed significant differences ranging from 9.61 to 10.81 and 6.30 to 7.56 g, respectively, under WS condition, whereas a similar grain yield was recorded between the +NILs and −NILs under well-watered condition. Isolines of +NIL and −NIL pairs showed similar chlorophyll content (SPAD), assimilation rate (A), and transpiration rate (Tr) at the beginning of the stress. However, the +NILs showed significantly higher SPAD (12%), A (66%), stomatal conductance (75%), and Tr (97%) than the −NILs at the seventh day of stress. Quantitative RT-PCR analysis targeting the MYB transcription factor gene Triticum aestivum MYB 82 (TaMYB82), within this genomic region which was retrieved from the wheat reference genome TGACv1, also revealed differential expression in +NILs and – NILs under stress. These results confirmed that the NILs can be invaluable resources for fine mapping of this QTL, and also for cloning and functional characterization of the gene(s) responsible for drought tolerance in wheat.

Keywords: drought tolerance, near-isogenic lines, quantitative trait loci, bread wheat, gene expression

## INTRODUCTION

fpls-10-00271 March 6, 2019 Time: 18:29 # 2

Global wheat production is often hampered by biotic and abiotic stresses like heat, drought, salinity, insects, and diseases. Among those stresses, drought is by far the most detrimental limiting the yield potential of wheat, particularly in rain-fed and limited irrigation environments (Akpinar et al., 2013; Budak et al., 2013). Effect of drought can be detected in various degrees during different growth periods including early establishment (Ayalew et al., 2015, 2016), pre-anthesis (Onyemaobi et al., 2017), and post-anthesis grain filling (Mia et al., 2017). Understanding the underlying mechanism of tolerance and identifying candidate genes and proteins will help to breed stress-resilient genotypes (Budak et al., 2015).

Although drought tolerance is a complex quantitative trait, a number of drought tolerance QTLs in wheat have been reported in various previous studies, most of which have been identified through grain yield and related component measurements under limited water conditions (Quarrie et al., 2006; Maccaferri et al., 2008; Mathews et al., 2008). However, large genomic intervals associated with those QTLs make them unsuitable for direct use in a breeding program. Characterizing those QTLs and identifying the causal genes, despite large genomic regions of interest, is quite a formidable task. This could be solved by developing near-isogenic lines (NILs) with different flanking markers of the respective QTL.

Near-isogenic lines have otherwise identical genetic backgrounds except at one or a few genetic loci and have been used intensively for detailed mapping and characterization of individual loci. Multiple pairs of isolines can offer a common genetic background to assess the phenotype conditioned by a genomic locus. Traditionally, NILs are developed through backcross introgression method (Kaeppler et al., 1993; Monforte and Tanksley, 2000; Babu et al., 2017). Alternatively, they can be generated following a selfing and selection scheme called heterogeneous inbreed family (HIF) analysis, which utilizes molecular markers linked to QTL of interest to identify heterozygous individuals, from which NILs can be extracted that are isogenic, but differ for certain genomic locations (Afanador et al., 1994; Tuinstra et al., 1997). Despite their usefulness for genetic and physiological studies of QTL, the considerable amount of time and effort needed to develop NILs have limited their use (Yan et al., 2017). However, a recently developed accelerated breeding technique for wheat called fast generation cycling system (FGCS) offers a suitable solution to this problem (Yan et al., 2017). Molecular marker assisted development of NILs following HIF analysis alone (Barrero et al., 2015), or coupled with FGCS, has been reported for some major QTLs in wheat (Ma et al., 2011; Wang et al., 2018) and barley (Habib et al., 2015). However, no such study has been reported for drought-tolerance related QTLs.

In this study, we utilize FGCS for fast track development of NILs targeting a QTL hotspot conferring drought tolerance following HIF analysis. Subsequently, those putative isolines were phenotyped for grain yield and related traits to characterize and confirm the NIL pairs. Several physiological parameters and relative gene expression were also evaluated for further confirmation and to explore the underlying mechanism of water stress (WS) tolerance during post-anthesis period in wheat.

## MATERIALS AND METHODS

### Plant Materials

Two bread wheat varieties of spring type growth habit, C306 and Dharwar Dry, were crossed to produce hybrids and subsequent segregating generations for this study. Several previous studies reported C306 (RGN/CSK3//2∗C591/3/C217/N14//C281) as a drought tolerant variety containing major-effect drought yield QTL (Aggarwal and Sinha, 1987; Sharma and Kaur, 2008; Kadam and Chuan, 2016). During crossing, C306 served as female parent, and Dharwar Dry was used as the male parent. NILs were developed from this cross targeting a QTL hotspot of 12 cM interval in C306 background by the HIF method (Tuinstra et al., 1997), coupled with immature embryo culture-based FGCS (Zheng et al., 2013; Yan et al., 2017). The procedure is summarized in **Figure 1**.

### Heterogeneous Inbreed Families (HIF) Method

Heterogeneous inbreed family analysis as described in Tuinstra et al. (1997) was followed for identifying NILs that differ for selected markers linked to a QTL of interest. In short, segregating generations from the biparental hybrids were advanced following single seed descent method till F4, where heterozygous plants were identified using the linked marker and selfed to produce seeds for next generation. For this study, marker gwm368 linked to the QTL qDSI.4B.1 (Kadam et al., 2012) was used to identify heterozygous individuals for the targeted QTL in advancing generations. Six to eight plants derived from each of the heterozygous plants were used for the next round of selection, genotyped with the linked marker, and only a single heterozygous plant from each progeny line was selected and selfed. This process of selecting heterozygous individuals and selfing was repeated from F<sup>4</sup> to F7. In F7, two isolines, homozygous but having different parental alleles, were isolated from the individual heterozygous plant progenies. These isolines served as putative NIL pairs for further phenotypic and physiological characterization in F<sup>8</sup> (**Figure 1**).

## Embryo Culture-Based Fast Generation Technique

For rapid generation advancement, an embryo culture-based FGCS as described in Yan et al. (2017) was followed from F<sup>3</sup> to F<sup>7</sup> (**Figures 1**, **2**). In each generation, about 12–14 days after anthesis (DAA), young embryos were harvested from sterilized developing grains in aseptic condition and cultured on suggested medium. Cultured embryos in petri plates were kept in a specially designed plant growth chamber in the dark to germinate and were then transferred into a 22◦C constant temperature room with 16 h light (fluorescent lamps) period for rooting. When the roots of the young seedlings were about 2.0 cm

long, they were transferred into 30-well Kwikpot trays (Rite-Gro Kwikpots, GardenCity Plastics) containing growing media in plant growth chambers until the soft dough (Zadoks growth scale Z85) stage, when grains were ready for next cycle of embryo culture (**Figures 2C,D**).

### Genotyping of the Segregating Populations and NILs

Genomic DNA was isolated from leaf tissues collected from seedlings of two to three leaves stage following the CTAB method with necessary modifications (Dreisigacker et al., 2017). Genomic RNA contamination was removed from the extracted DNA by treating with RNaseA. The quality of the treated DNA was assessed by a NanoDrop 2000 spectrophotometer (ND-2000, Thermo Fisher Scientific, Inc., United States) and concentration was adjusted as per requirement. Integrity of the DNA was also checked by 1% agarose gel electrophoresis. Polymerase chain reaction (PCR) amplification was performed in an Eppendorf Mastercycler with 15 µl reaction volumes for each sample containing 50 ng template DNA, 200 nM of each primer, 1.5 µl 10 × PCR buffer, 2 mM MgCl2, 0.2 mM

FIGURE 2 | (A) QTL hotspot in wheat chromosome 4BS and the marker (circled in green) used for selection, adapted from Kadam et al. (2012). (B) Selection of different progeny types using molecular marker. (C) Culture of young embryos on petri-plates with sterile media. (D) Young seedlings from embryo culture growing in plant growth chamber. (E) NIL pair at anthesis in glasshouse. (F) NIL pair at physiological maturity. (G) Hundred seeds of a NIL pair.

dNTPs, and 1 unit TaqDNA polymerase (Fisher Biotec) with initial denaturation at 94◦C for 3 min, followed by 40 cycles of 94◦C for 30 s, annealing at 58◦C for 30 s, and 72◦C for 30 s, with a final extension at 72◦C for 5 min. PCR products were electrophoresed in 2.5% agarose gel, stained with ethidium bromide, and visualized with UV trans-illuminator. All plants were genotyped with the gwm 368 marker and grouped according to the genotypes "+ +," "+−," and "− −," where "+" represents allele of C306 type and "−" represents allele of Dharwar dry type. The DNA profile from each marker was scored in each generation and only heterozygous plants ('+'−) were selected for the next cycle, except for the last cycle (F7) when homozygous progeny ("+ +," and "− −") from each of those heterozygous plants were selected as NIL pairs (**Figure 2**).

### Phenotyping of the NILs (F8)

Near-isogenic lines were phenotyped following the procedure as described in Mia et al. (2017) in a temperature-controlled and naturally lit glasshouse at The University of Western Australia, Crawley, Western Australia (31◦ 590 S, 115◦ 49<sup>0</sup> E) in 50 cm × 9 cm cylindrical pots containing 2.5 kg of soil

media (5:2:3 compost: peat: sand, pH∼6.0). The experiment was arranged in a completely randomized block design with three replicates. Field (pot) capacity of the soil media was initially measured in three free draining pots, each containing 2.5 kg of air-dry soil media, by inundating the pots with water and allowing them to drain for 48 h. Three random samples from each pot were taken, and their weights were measured using a balance before and after oven-drying to calculate per cent water content of the media at filed capacity using the following formula: %soil water content = FW−DW DW , where FW = fresh weight and DW = dry weight of the samples. Pots were watered and maintained at around 80% field capacity (FC) by weighing and manual watering on each alternate days until anthesis, Zadoks growth scale Z60 for cereals (Zadoks et al., 1974). At anthesis, two water treatments were implemented, where soil moisture in the well-watered (WW) treatment were maintained at about 80% FC by daily weighing and watering and continued until plants reached physiological maturity (Z91). In contrast, plants in the WS treatment were weighed but not watered for a period of 7 days from anthesis, with watering resuming on 7 DAA as per WW treatment and maintained until physiological maturity. At physiological maturity, plants were harvested manually and separated into shoots and roots. Grains were collected and dried at 35◦C for 72 h and the rest of the shoots were oven-dried at 65◦C until constant weight. Roots from individual pots were washed thoroughly and then oven-dried at 65◦C until they reach a constant weight. Data on plant height at maturity, days to anthesis, days to maturity, spikelet numbers per spike, fertile tiller (having spikes with grain) per plant, shoot biomass, root biomass, harvest index, hundred grain weight, and grain yield per plant were recorded.

### Physiological Traits

The confirmed NIL pairs were also characterized for physiological traits under both stressed and stress-free conditions. Chlorophyll contents of the isolines were also measured at those times using a handheld portable chlorophyll meter (SPAD-502Plus; Konica Minolta, Osaka) just before treatment imposition and on 4 and 7 days after stress (DAS) treatment. Leaf gas exchange measurements (net photosynthesis rate, transpiration rate, and stomatal conductance) were also measured during those time points using a portable photosynthesis system (LI-6400, Li-COR Inc., Lincoln, NE, United States) with a LED light source on the leaf chamber. In the LICOR cuvette, CO<sup>2</sup> concentration was set to 400 µmols−<sup>1</sup> and LED light intensity was 1500 µmolm−<sup>2</sup> s −1 .

### RNA Extraction and Quantitative Reverse Transcriptase Polymerase Chain Reaction (qRT-PCR) Analysis

Flag leaf tissues from the NILs were snap frozen in liquid nitrogen and stored in −80◦C for later use. For RNA extraction, leaf tissues from isolines with C306 background were bulked in one sample and those of Dharwar Dry background were bulked in another. There were three independent biological replicates, each with three technical replicates for both stressed and stress-free conditions. Total RNA was extracted using RNeasy <sup>R</sup> Plant mini kit (Qiagen) with DNase digestion to eliminate genomic DNA contamination. Total RNA was assayed qualitatively and quantitatively by Nanodrop 2000, denatured gel electrophoresis, and LabChip <sup>R</sup> GX Touch capillary electrophoresis (PerkinElmer). The cDNA was synthesized using a SensiFast cDNA Synthesis Kit (Bioline) following manufacturer's protocol. Quantitative real-time PCR (qRT-PCR) was carried out with an ABI 7500 Fast system using SensiFast syber kit (Bioline). For gene expression analysis, Triticum aestivum MYB 82 (TaMYB82) gene, repeatedly found in this genomic region, was selected as target gene (Kadam et al., 2012; Liu et al., 2015). Primers (Forward 5<sup>0</sup> -TCGTCGGGTTCGTTCACATC-3<sup>0</sup> , Reverse 5 0 -GGTCGACGTGGAAAAGACCA-3<sup>0</sup> ) were developed form the highly conserved exonic region of the MYB transcription factor gene, retrieved from chromosome 4BS of wheat reference genome TGACv1, using Geneious software version11.1.2 (Kearse et al., 2012). Wheat actin gene (Forward 5<sup>0</sup> -CTCCCTCACAACAACCGC-3<sup>0</sup> , Reverse 5<sup>0</sup> - TACCAGAACTTCCATACCAAC-3<sup>0</sup> ) was used as endogenous control (Yu et al., 2017). Amplification was performed in a 20 µl final reaction mix containing 100 ng cDNA, 10 µl of 2X SensiFast SYBR Lo-ROX mix, and 0.8 µl of each primer (10 µM) with the following protocol: 95◦C for 2 min (1 cycle), 95◦C for 5 s, and 62◦C for 30 s (40 cycle), melt curve analysis concluding with a 4 ◦C hold. Relative gene expression was determined by the comparative Ct method (Livak and Schmittgen, 2001; Schmittgen and Livak, 2008).

### Statistical Analysis

Statistical analysis was performed using GenStat software, 17th edition (Payne et al., 2014). t-test was used to compare the difference between various means. Graphs were produced using statistical software R 3.5.1 with R package "ggplot2" and "gridExtra" (R Core Team, 2013; Wickham, 2016).

## RESULTS

### Development of the Near-Isogenic Lines

Twenty-one F<sup>1</sup> seeds were generated from the cross between C306 and Dharwar Dry. Hybridity of the F<sup>1</sup> seeds was confirmed using the flanked marker gwm368. A total of 310 seeds from the best performing hybrid were grown from F<sup>2</sup> to F<sup>4</sup> following single seed descent method. 275 F<sup>4</sup> plants were screened with the QTL flanking SSR marker gwm368 and 21 were found to be heterozygous. Six to eight seeds from each of those 21 plants were grown and screened with the marker to select only heterozygotes until F<sup>7</sup> where two homozygous plants, one with + allele and another with − allele, from the same progeny lines were selected. At the end of F7, 14 putative isoline pairs were recovered and 10 pairs having similarity in flowering time and morphology were selected for phenotyping at F8.


<sup>∗</sup>ns = non-significant at P ≤ 0.05; s = significant at P ≤ 0.05; + = isoline with C306 background, − = isoline with Dharwar Dry background; pairs in bold are confirmed NILs. The significant difference was calculated using Student's t-test.

### Putative Isolines Vary for Grain Yield and Grain Weight Under Post-anthesis Water Stress

Grain yield and grain weight of the 10 putative NIL pairs under WW and post-anthesis WS condition are given in **Table 1**. In general, post-anthesis WS caused significant reduction in both grain weight and grain yield, though there was sharp contrast in plants' responses between the NIL with C306 background (termed as +NIL) and the corresponding NIL with Dharwar Dry background (−NIL) of the same pair. In the WW, the grain yield per plant and 100 grain weight of the NILs ranges from 10.56 to 15.52 and 4.45 to 5.60 g, respectively. By contrast, the grain yield per plant and 100 grain weight of the NILs ranges from 4.57 to 10.81 and 3.19 to 5.13 g, respectively, under stressed condition. On average, postanthesis WS caused 42.57% reduction in grain yield and 12.51% in grain weight.

No significant difference was observed between the majority of the NILs of the WW condition. However, in the stress treatment, significant differences between the NILs were observed in most of the cases. We were particularly interested in the isoline pairs which showed similar responses under stressed-free condition but varied significantly under stressed condition. Out of 10 putative NILs pair, only four pairs, viz., qDSI.4B.1-2, qDSI.4B.1-3, qDSI.4B.1-6, and qDSI.4B.1-8 showed such significant difference between the isolines for both grain yield and grain weight and were considered as confirmed NIL pairs. For example, +NIL and −NIL of qDSI.4B.1-3 was recorded with mean grain yield per plant of 13.56 and 13.50 g, respectively, under WW condition, whereas in the stressed condition, mean grain yield per plant of −NIL was 6.49 g, about 32% less than that of the corresponding +NIL (9.61 g). Similarly, +NIL produced 38% higher grain weight than the corresponding −NIL (3.72 g) under stress.

Differences between the isoline pairs of qDSI.4B.1-4 and qDSI.4B.1-9 were significant only for either grain weight or grain yield and therefore were not considered as true NIL pairs. In most cases, NILs with C306 background out-yielded the corresponding NIL with Dharwar Dry background. However, isoline qDSI.4B.1-10(−), which has an allele from Dharwar Dry, outperformed the corresponding isoline with C306 allele under both stressed and stress-free conditions.

### Other Phenological Traits of the Confirmed NILs

Near-isogenic lines pairs commenced anthesis at around 68 days after sowing (**Figure 3**). However, under stressed treatment, they reach physiological maturity at around 103 DAS, about 1 week earlier than that under WW treatment (110 DAS). NIL pairs of qDSI.4B1-6 in the WW treatment and those of qDSI.4B1-3 in the stressed treatment differed significantly for days to maturity. Significant difference in plant height was observed only between the NIL pairs of qDSI.4B1-3 and qDSI.4B1-2 in WW treatment and stressed treatment, respectively. In both cases, +NIL was taller than

the corresponding −NIL. NIL pairs did not differ significantly in terms of effective tiller number, and spikelet number across conditions.

In both stressed and non-stressed conditions, isoline qDSI.4B.1-2(+) and qDSI.4B.1-8(+), each of which has an allele form C306, showed higher shoot biomass than their corresponding isoline qDSI.4B.1-2(−) and qDSI.4B.1-8(−), respectively (**Figure 3**). By contrast, root biomass and harvest index were significantly lower in isolines with Dharwar Dry background in almost all of the four NIL pairs under stressed

the confirmed NILs under post-anthesis water stress. Data points are mean ± SD (n = 12). Triangles and circles represent data points for NIL with C306 background (+NIL), and NIL with Dharwar Dry background (−NIL), respectively.

treatment. For example, under stressed condition, root biomass and harvest index of +NIL of qDSI.4B1-3 were about 40 and 27% lower, respectively, under stressed condition, than the corresponding −NIL, respectively.

### Physiological Traits of the Confirmed NILs

Results indicated that isolines of the NIL pairs responded variably in terms of physiological parameters when post-anthesis WS was applied (**Figure 4**). The magnitude of reduction varied between isolines of the NIL pairs with the increase of the stress period. Just prior to beginning of the stress period (0DAS), both the isolines had similar chlorophyll content, assimilation rate (A), stomatal conductance (gs), and transpiration (Tr). However, with the progression of the stress period, both +NIL and −NIL showed significant decrease in A, g<sup>s</sup> , SPAD units, and T<sup>r</sup> . Overall, the +NIL maintained comparatively higher A, chlorophyll content, T<sup>r</sup> , and g<sup>s</sup> than the –NIL during the stress period.

### Differential Expression of TaMYB82 Gene Between the Isolines

Relative expression of TaMYB82 gene was significantly different between +NIL and −NIL under post-anthesis WS

at both time points (4 and 7 DAS; **Figure 5**). However, under WW condition (0 DAS), no significant difference was observed between the expression level of TaMYB82 gene in +NIL and –NIL. At 4 DAS, there was about threefold difference in relative expression level of TaMYB82 gene between them. This difference was reduced but still significant at 7 DAS.

### DISCUSSION

In this study, we reported the development of 10 putative pairs of NILs targeting a major locus for drought tolerance. Molecular screening and phenotyping of those NILs confirmed four true NILs which showed differential responses under well-watered and water-stressed conditions, as hypothesized. Among those, NILs having alleles from the C306 background showed improved performances in terms of grain yield and grain weight. This is because the nearby marker used for screening were tightly linked with the QTL identified for drought tolerance in parent C306, which may harbor some key genes responsible for grain yield and grain weight under post-anthesis stressed condition (Kadam et al., 2012). Several previous studies also reported that this genomic region is a rich hub for grain yield and related traits in spring wheat (Marza et al., 2006; Yang et al., 2007; Mathews et al., 2008; Pinto et al., 2010; Cabral et al., 2018).

Our study also reported that the NILs having allele from the C306 background are better performers in terms of physiological traits under post-anthesis WS. NILs with C306 background maintained comparatively higher photosynthesis, chlorophyll content, stomatal conductance, and transpiration rate compared to the corresponding NILs with Dharwar Dry background. One possible explanation of this might be the higher chlorophyll content and root biomass of the +NILs compared to the −NILs. Comparatively higher root biomass provides higher chances of accessing available soil moisture under stress (Wasson et al., 2012; Maeght et al., 2013). Kumar et al. (2012) reported several QTLs for photosynthesis and chlorophyll content in C306 background. Moreover, several previous studies indicated positive correlation of chlorophyll content and gas exchange parameters (Chen et al., 2015; Wang et al., 2016). This might be another possible explanation of the higher grain yield and grain weight in the tolerant NILs under stress.

Gene expression analysis of NILs revealed that TaMYB82 was markedly downregulated in the better performing NILs with C306 background when compared with the expression pattern of the NILs with Dharwar Dry background. This suggests that TaMYB82 acted as a negative regulator in response to the post-anthesis WS. The role of negative regulators of MYB transcription factors under drought has also been reported by Cui et al. (2013) and Jaradat et al. (2013). Furthermore, Zhao et al. (2017) indicated the role of TaMYB82 in an ABA-dependent signaling transduction pathway in response to abiotic stress. This supports the idea of involvement of MYB transcription factors in drought response mechanisms in wheat (Baldoni et al., 2015).

Although we identified four confirmed NIL pairs based on phenotyping and genotyping evaluation, there were six pairs of NILs, which were not in agreement with marker trait association. The possible explanation for this could be the recombination events between the targeted QTL and the nearby marker used for screening (Peleman et al., 2005). For example, in qDSI.4B.1-10, the isoline with Dharwar Dry background showed improved performance with respect to grain yield and grain weight compared to its counterpart, the NIL with the C306 background, under both WW and WS conditions. Moreover, despite having different genetic background, the isolines of NIL pairs qDSI.4B.1-1, qDSI.4B.1-5, and qDSI.4B.1-7 did not differ significantly in terms of grain yield and grain weight across conditions. Wang et al. (2018) also reported such phenomena in wheat during NIL production following HIF. Access to a higher resolution genetic map of this QTL locus saturated with more nearby markers would have solved this challenge. Yadav et al. (2011) described the utilization of multiple QTL-NILs for validation of drought tolerance QTL and how they were used for developing and mapping new gene-based markers in that genomic region. Therefore, fine mapping of this QTL using the confirmed NIL pairs reported in this study will provide an excellent opportunity to identify more closely linked markers, which will ensure higher selection efficiency (Gupta et al., 2017).

Transcriptomic analysis of multiple NILs enabled Barrero et al. (2015) to identify the candidate genes for a targeted QTL in wheat. Habib et al. (2018) also identified two key genes

for a major dormancy QTL in wheat using similar multiple QTL-NILs RNA sequencing approach. Following these examples, next-generation transcriptome sequencing of the confirmed NILs under contrasting water regimes can be pursued as a future direction of the current study in order to identify the candidate genes in qDSI.4B.

Development of isolines for cereal crops requires substantial investment of time, regardless whether the traditional backcrossing or HIF method being followed. However, for this study, we utilized a rapid breeding technique which dramatically shortened the life cycles of segregating generations as reported in Zheng et al. (2013). Some drawbacks of this technique include dissecting of young embryo individually and culturing them in vitro, which requires considerable amount of effort and labor. Additionally, a sterile environment must be maintained during embryo culture. However, a recent discovery regarding shortening the life cycle of wheat by utilizing the longer photoperiod (22 h) with specialized LED lights might allow researchers to avoid the exploitation of immature embryo culture (Watson et al., 2018). Another limitation was the use of a single marker during marker-assisted selection. Between the two flanking markers (barc20 and gwm368) of the targeted QTL, only gwm368 was found to be polymorphic between the two parents. The other flanking maker barc20 was not polymorphic while the next closest neighboring marker (wmc125) were too far away (nearly 25 cM) from the targeted QTL. Hence, only gwm368 was used in the current study. Habib et al. (2015) and Wang et al. (2018) also successfully used one single marker to develop NILs in wheat and barley, respectively, following HIF method.

### REFERENCES


### CONCLUSION

In summary, the present study confirmed the importance of the 4BS QTL from the C306 background in post-anthesis drought tolerance. The confirmed NILs identified in this study are a valuable resource for future fine mapping of this QTL and for cloning and functional characterization of the gene(s) responsible for post-anthesis drought tolerance.

### AUTHOR CONTRIBUTIONS

MM, HL, and GY conceived and designed the study. MM carried out the experimental procedures at plant growth chambers and greenhouse. MM and XW performed immature embryo culture and molecular marker assisted selection. MM collected all relevant data and performed analysis with occasional help from XW. MM prepared the manuscripts with inputs from XW, HL, and GY. All the authors reviewed the manuscript and approved the submission.

### ACKNOWLEDGMENTS

MM acknowledges the Endeavour Postgraduate Scholarship from the Australian Government for sponsoring his Ph.D. study. The authors express their heartfelt thanks to Ms. Pratima Gurung for her cordial help during data collection. The authors also acknowledge the University of Western Australia for funding the operational cost of the research.


population designed to minimize confounding agronomic effects. Theor. Appl. Genet. 121, 1001–1021. doi: 10.1007/s00122-010-1351-4


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Mia, Liu, Wang and Yan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

fpls-10-00271 March 6, 2019 Time: 18:29 # 11

# Genetic Characterization of Resistance to Pyrenophora teres f. teres in the International Barley Differential Canadian Lake Shore

Eric Dinglasan<sup>1</sup> , Lee Hickey<sup>1</sup> \*, Laura Ziems<sup>1</sup> , Ryan Fowler<sup>1</sup> , Anna Anisimova<sup>2</sup> , Olga Baranova<sup>2</sup> , Nina Lashina<sup>2</sup> and Olga Afanasenko<sup>2</sup>

<sup>1</sup> Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St Lucia, QLD, Australia, <sup>2</sup> All-Russian Institute of Plant Protection, Saint Petersburg, Russia

### Edited by:

Hikmet Budak, Montana State University, United States

#### Reviewed by:

Concetta Lotti, University of Foggia, Italy Martin Mascher, Leibniz-Institut für Pflanzengenetik und Kulturpflanzenforschung (IPK), Germany

> \*Correspondence: Lee Hickey l.hickey@uq.edu.au

#### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 21 November 2018 Accepted: 28 February 2019 Published: 25 March 2019

#### Citation:

Dinglasan E, Hickey L, Ziems L, Fowler R, Anisimova A, Baranova O, Lashina N and Afanasenko O (2019) Genetic Characterization of Resistance to Pyrenophora teres f. teres in the International Barley Differential Canadian Lake Shore. Front. Plant Sci. 10:326. doi: 10.3389/fpls.2019.00326 Genetic resistance to net form of net blotch in the international barley differential Canadian Lake Shore (CLS) was characterized and mapped. A doubled haploid (DH) population generated from a cross between CLS and susceptible cultivar Harrington was evaluated at the seedling stage using eight diverse Pyrenophora teres f. teres (Ptt) isolates and at the adult stage in the field using natural inoculum. To effectively map the CLS resistance, comparative marker frequency analysis (MFA) was performed using 8,762 polymorphic DArT-seq markers, where 'resistant' and 'susceptible' groups each comprised 40 DH lines displaying the most extreme phenotypes. Five DArTseq markers were consistently detected in eight disease assays, which was designated qPttCLS and deemed to harbor the locus underpinning CLS resistance. Four of these markers were present onto the barley DArTseq physical map and spans a region between 398203862 and 435526243 bp which were found to consist several genes involved in important plant functions such as disease response and signaling pathways. While MFA only detected the 3H region, genetic analyses based on segregation patterns were inconsistent, suggesting complex inheritance or variation in phenotypic expression of qPttCLS, particularly in the field. This study represents progress toward connecting Ptt pathotype surveys with the corresponding resistance genes in barley differentials. The markers associated with qPttCLS are useful for marker-assisted selection in breeding programs.

Keywords: net form net blotch, barley, QTL, genetic resistance, marker-assisted selection

## INTRODUCTION

Net form of net blotch (NFNB) caused by ascomycete fungus Pyrenophora teres f. teres (Ptt) is considered one of the most widespread and destructive diseases of barley (Hordeum vulgare L.) crops worldwide. For susceptible cultivars, NFNB can result in yield losses up to 40% (Steffenson et al., 1991; Murray and Brennan, 2010), and under extreme epidemic conditions may cause even higher losses, up to 70% (Wallwork et al., 2016). Still, the most effective means of control is the use of resistant cultivars.

Resistance of barley to Ptt has been documented as both qualitative and quantitative resistance (Lai et al., 2007), suggesting a gene-for-gene interaction and complex genetic interactions, respectively. In both cases, resistance is attributed to the isolate of the pathogen.

Ptt is a highly diverse pathogen (Khan and Tekauz, 1982; Steffenson and Webster, 1992; Robinson and Jalli, 1997; Afanasenko, 2001; Serenius, 2006; Liu et al., 2012). However, the measure of virulence diversity is limited by the number of differential genotypes used in the tests (Wallwork et al., 2016). The barley cultivar, Canadian Lake Shore (CLS), has been used in numerous studies examining the virulence of local Ptt isolates in different barley growing regions around the world (Gray, 1966; Gacek, 1985; Steffenson, 1988; Steffenson and Webster, 1992; Afanasenko et al., 1995; Minarikova, 1995; Minarikova and Polisenska, 1999; Gupta and Loughman, 2001). Based on results from these studies, CLS is a good genotype for discriminating Ptt isolates. For this reason, CLS was included in the international set of barley differentials – established to standardize the characterization of Ptt populations globally (Afanasenko et al., 2009). However, knowledge of the genetic control(s) of resistance in CLS is unknown. This insight is essential to connect outcomes from pathogen virulence studies to genomic regions conferring resistance and/or susceptibility in the host. Such information will empower breeders to assemble effective host resistance in new barley cultivars.

In this study we performed genetic characterization of CLS resistance at both seedling and adult growth stages using diverse isolates sourced from Russia, Belarus, Germany, Canada, and South Africa. We examined a doubled haploid (DH) population derived from a cross between CLS and susceptible cultivar Harrington. Marker frequency analysis (MFA) was performed based on lines representing 'resistant' and 'susceptible' classes, in order to effectively map the CLS resistance and identify DNA markers useful for marker-assisted selection (MAS).

### MATERIALS AND METHODS

### Plant Materials

A total of 101 anther-culture-derived DH lines were developed from F1 plants of the cross between spring barley cultivars CLS and Harrington at the All Russian Research Institute for Plant Protection, Saint Petersburg, Russia. Anthers were cultured according to the method established by Manniner (1997). Notably, both of these cultivars are included in the international set of barley differentials: CLS is a resistant cultivar with good differential ability and Harrington is a susceptible check (**Table 1**) (Afanasenko et al., 2009).

### Pathogen Materials

Eight Ptt isolates sourced from different origins were used for screening the DH population and parents in this study (**Table 1**). Isolates were obtained from infected barley leaves collected between 2011 and 2017. Leaves were surface sterilized with 3% CuSO<sup>4</sup> for 1 min and then double rinsed in sterile distilled TABLE 1 | Details for the Pyrenophora teres f. teres isolates used in this study.


<sup>∗</sup>Cultivar on which the isolate was collected; NA – cultivar information not available.

water. Isolates were propagated on Czapek's modified medium containing the following: 0.5 g/L KH2PO4, 0.5 g/L MgSO4, 0.5 g/L KCl, 1.2 g/L urea, 20 g/L lactose, and 20 g/L agar. The Petri dishes were incubated for 10 days at 20 ± 2 ◦C under constant illumination with a daylight lamp (3000 lux). Single conidia were transferred to the same medium and incubated at the same conditions for 10–12 days. Based on preliminary screening, all isolates were avirulent to CLS.

### Fungal Preparation and Inoculation of Seedlings

The cultures were flooded with distilled water containing 0.01% TWEEN <sup>R</sup> 20 and conidia were dislodged with a sterile spatula. The spore suspension was filtered through gauze. The spore concentration of the suspension was determined by means of a hemocytometer and adjusted to 5,000 conidia per ml for inoculation.

Evaluation of resistance of DH lines was assayed using the detached leaf method (Afanasenko et al., 1995). Barley seedlings were grown on cotton soaked in water in enameled trays for 8–10 days at 20–22◦C with alternating 12 h periods of light (exposure 3000 lux) and dark. Primary leaves were excised and 4–5 cm long segments and were placed in enamel trays (27 cm × 33 cm) on filter paper moistened with sterile

water containing 0.004% benzimidazole. For each DH line, leaf segments from two seedlings in four repetitions (total eight seedlings) were placed on the filter paper in different trays. Resistant and susceptible parents were placed in each tray. Inoculation was performed by spraying suspension at a rate of 1 ml per 20 leaf segments. The trays were covered with glass plates and returned to the same light and temperature conditions as used for growing the seedlings.

### Scoring Seedling Infection Response

Five days post-inoculation, the infection response (IR) was recorded using the 10-point scale of Tekauz (1985). For analysis of segregation patterns, lines displaying an average IR of ≤4.9 were considered resistant, and ≥5 were considered susceptible.

### Adult Plant Screening in the Field

The DH lines and two parents Harrington and CLS were screened in 2015 at the adult plant stage in the field, located at the State Cultivars Screening Nursery "Volosovo," Leningrad Region, Russia. Ptt is endemic at this location and net blotch epidemics are observed each year. Lines were planted in 1 m rows (15–20 seeds) in a randomized block design with two replications per line. Parents were planted at the beginning and end of every 10 rows. In order to promote disease development two rows of the highly susceptible cultivar Carlsberg were planted around the experimental plots. No artificial inoculum and no herbicides were applied. NFNB reaction was scored at the growth stage of GS75 (6th to 8th August 2015) using a 1–9 scale, where 1 = very resistant, and 9 = very susceptible.

### Genotyping and Comparative Marker Frequency Analysis

Genomic DNA was extracted for a subset of 94 DH lines and the two parents using the protocol recommended by Diversity Arrays Technology Pty Ltd. (DArT<sup>1</sup> ). The samples were genotyped using the Barley GBS 1.0 platform, which returned 8,762 polymorphic silico DArTseq markers.

Marker data was subjected to a quantitative allele frequency analysis technique, known as comparative MFA (Ziems et al., 2017), to identify quantitative trait loci (QTL) associated with Ptt resistance. The frequency of alleles contributing resistance (R) contributed by CLS was compared with the frequency of alleles contributing susceptibility (S) by Harrington in the segregating DH progeny. A discriminant value reflecting the difference in allele frequency between the two classes was obtained for each marker according to Wenzl et al. (2006, 2007). This approach can effectively identify genetic loci influencing a trait of interest without the need to generate a linkage map (Wenzl et al., 2007). Each phenotypic class (R or S) comprised 40 DH lines that displayed the most extreme phenotypes in each disease assay. Each marker was subjected to a simple Chi-squared test to detect significant discrimination between the expected and observed allele frequencies. A differential threshold of >0.4 discriminant value and P < 0.001 for a marker to be considered associated was determined, ensuring there is a 0.1% probability of detecting an allele frequency difference by chance.

The genomic intervals containing associated DArT-seq markers were displayed on the barley DArTseq consensus map and positioned onto the barley DArTseq physical map using Pretzel (Keeble-Gagnère et al., 2019).

The genomic interval of interest harboring the locus associated with Ptt response was searched for the presence of genes through EnsemblPlants database using the barley genome assembly Hordeum vulgare (IBSC\_v2) of the International Barley Genome Sequencing Consortium<sup>2</sup> .

### RESULTS

### Seedling Response to Ptt Isolates

The resistant parent CLS displayed a low IR across all disease assays (ranging 1.0–4.4), and susceptible parent Harrington displayed a high IR in all assays (ranging 7.0–10.0; **Figures 1**, **2**, **Table 2** and **Supplementary Table 1**). The segregation ratio for four of the eight isolates (i.e., Len7, Ps31, Vol13, and G5) appeared consistent with a single gene inheritance model (i.e., 1:1, **Table 2**). However, Mendelian analyses based on division of the progeny to susceptible and resistant classes could not explain the segregation pattern observed for isolates Bel1,Pr11, Can11, and SA7 (**Table 2**).

The correlation between IRs observed for different Ptt isolates in the DH progeny varied from 0.26 to 0.78 (**Table 3**). The highest degree of correspondence (r = 0.78) was found between IR to isolates from Far East of Russia (Pr11) and Pskov Region (Ps31) and the lowest correspondence observed (r = 0.26) for IR to isolates from Belarus (Bel1) and Germany (G5) (**Table 3**).

### Adult Plant Stage Screening in the Field

In the field experiment, NFNB severity on the susceptible cultivar Carlsberg reached 50–60% leaf area infected at the time of assessment. The IR of the susceptible parent Harrington was also high (7.5 on 1–9 scale). As expected, the IR of the resistant parent CLS was low (3.0) (**Supplementary Table 1**).

When evaluated at the adult plant stage in the field, the CLS/Harrington DH population displayed a bi-modal distribution of resistance to Ptt (**Figure 1**). Unlike results from the seedling assays, segregation of resistance did not fit a single gene model, but instead fitted to a 1:3 (R:S) ratio, suggesting a two complementary gene inheritance model (**Table 2**). Interestingly, only one DH line displayed a lower IR than the resistant parent CLS. High correlations (0.71–0.77) between adult plant reaction in the field and seedling reaction were observed for five isolates: Len7, Ps31, Pr11, Can11, and SA7 (**Table 3**). The lowest correlation between adult plant reaction and seedling reaction was found for the isolate from Germany (G5, 0.40).

<sup>1</sup>www.diversityarrays.com

<sup>2</sup>https://plants.ensembl.org/Hordeum\_vulgare/Info/Index?db=core

FIGURE 1 | Frequency distribution of infection responses (IRs) in the CLS/Harrington DH population evaluated using eight Pyrenophora teres f. teres isolates at the seedling stage (Len7, Bel1, Ps31, Pr11, Vol13, G5, Can11, and SA7) and natural inoculum at the adult stage in the field. Box plots show upper and lower quartile where horizontal line represents median IR and overlaid is the raw data points; split violin plots represent the density estimates related to the distribution of CLS/Harrington DH population in each assay. Symbols colored in red indicate mean IR displayed by parental genotypes (– Harrington; ×– CLS) in each assay.

TABLE 2 | The infection response (IR) of doubled haploid (DH) progeny and parents to different Pyrenophora teres f. teres isolates.


The results from Chi-squared (χ 2 ) analyses are also presented.

<sup>∗</sup>Natural inoculum evaluated at the adult growth stage.

Res = resistant; Sus = susceptible.

P 5% = 3.84 at 1 df.

TABLE 3 | Correlation (r) between infection responses observed for different Pyrenophora teres f. teres isolates in the CLS/Harrington doubled haploid population.


<sup>∗</sup>Natural inoculum evaluated at the adult growth stage.

TABLE 4 | Summary of resistant (R) and susceptible (S) classes used for Marker Frequency Analysis for the six Pyrenophora teres f. teres assays.


<sup>∗</sup>Natural inoculum evaluated at the adult growth stage.

### Genomic Regions Associated With Resistance

The mean IR for 'resistant' and 'susceptible' classes was clearly differentiated across the six disease assays (**Table 4**). Comparative MFA identified 251 DArTseq markers associated with resistance to all isolates, except G5. These markers span a region of 36.26–76.56 cM on chromosome 3H of the barley consensus genetic map (**Supplementary Table 2**). Marker 4016922 (44.74 cM) was identified to be the most significant (P = 1.80E-18) and had the highest discriminant value (D = 0.74), followed by 3270940 (50.50 cM; P = 2.40E-16, D = 0.61) which was associated in seedling response to isolates Vol13 and PS31, respectively. In the field, marker 3268587 was the most significant (P = 2.30E-17) and the highest discriminant value (D = 0.67). Alleles contributing resistance were donated by resistant parent CLS. Five DArT-seq markers were consistently detected in eight disease assays, positioned within a smaller window of 0.5 cM ranging from 51.27 to 51.77 cM (**Figure 3** and **Table 5**). We designated this QTL region qPttCLS.

From these five markers, four (3255462, 3257991, 3272635, and 4190028) were positioned onto the barley physical map spanning an interval region between 398203862 and 435526175 bp. This region was found to harbor 179 genes (**Supplementary Table 3**) of which 27 genes with annotated functions are involved in plant disease response, cell death, and signaling pathways (**Table 6**).

### DISCUSSION

In this study we identified a major genomic region (qPttCLS) conferring resistance to Ptt in the international barley differential CLS. Although segregation for resistance was consistent with a single gene when the DH population was evaluated at the seedling stage using some isolates, this model did not fit all seedling datasets and segregation observed at the adult stage in the field suggested a two complimentary gene model. Thus, while we have identified a key genomic region conferring resistance to Ptt in CLS, stable expression could be more complex and may involve additional genetic factors, particularly at the adult growth stage.

The high degree of variation of Ptt is highlighted in numerous studies using both virulence markers (Tekauz, 1990; Steffenson and Webster, 1992; Afanasenko et al., 2009; Jalli, 2010) and molecular markers (Peever and Milgroom, 1994; Rau et al., 2003; Serenius, 2006). CLS is one of nine genotypes included in the international set of barley differentials, which is used to characterize populations and virulence phenotypes of Ptt (Afanasenko et al., 2009). This core set of genotypes is essential to track changes in the pathogen population in the different barley growing regions around the world. A major objective is to identify the loci underpinning resistance in all differential genotypes. This insight will identify differentials that carry identical genes or haplotypes conferring resistance, and results from pathogen surveys can then be linked directly to characterized resistance genes in the host.

Mode and Schaller (1958) first reported CLS to carry resistance to Ptt and based on their inheritance studies, CLS

interval region spanning 398203862–435526243 bp onto the physical map.

TABLE 5 | Genetic position and discriminant values for the subset of DArT-seq markers that were consistently detected in eight marker frequency analyses performed in this study.


All five markers were significantly associated with resistance to P. teres on Chromosome 3 (P < 0.001) and mark the 0.5 cM region containing qPttCLS donated by Canadian Lake Shore.

<sup>∗</sup>Natural inoculum evaluated at the adult growth stage.

TABLE 6 | List of 27 genes defined by the four (4) DArTseq markers (qPttCLS) positioned onto the barley physical map on the interval region 398203862–435526243 bp.


The corresponding description and functions are based on the database search on EnsemblPlants using the barley genome assembly Hordeum vulgare (IBSC\_v2) of the International Barley Genome Sequencing Consortium. Only the genes at the start and end position of marker interval, and genes with annotated description were searched for biological, cellular, and molecular functions.

appeared to contribute two major resistance genes (Pt2 and Pt3). However, more recent studies examining CLS in populations derived from crosses to susceptible cultivars Pirkka and Nadja, observed segregation patterns consistent with a single gene conferring resistance (unpublished data). Although results from genetic analyses based on segregation patterns were variable in this study, the comparative MFA performed for each assay, identified a single major genomic region on Chromosome 3H. Further, correlations between all disease assays were high, except in G5. This seems to point toward variability in levels of resistance conferred by the qPttCLS locus. This may explain the distorted segregation pattern observed for isolates Bel1, Pr11, Can11, and SA7; and the ineffectiveness of qPttCLS against isolate G5. After all, segregation analyses are based on a threshold applied to 'resistance' based on the observed IRs. This could also be the case for the adult assessment in the field, where a two complementary gene model was found to be significant based on Chi-squared analysis. While this suggests an additional gene may be involved, variation in expression of qPttCLS could be influenced by environmental cues in the field, which could lead to low numbers of DH lines actually displaying high levels of resistance. Another plausible reason for this variation in IR could be the presence of isolate-specific minor resistance genes in CLS. A number of previous mapping studies have reported isolate specific QTL (Ho et al., 1996; Grewal et al., 2012; König et al., 2014; Afanasenko et al., 2015).

Several studies have detected QTL for resistance to Ptt on Chromosome 3H (Graner et al., 1996; Steffenson et al., 1996; Richter et al., 1998; Cakir et al., 2003; Raman et al., 2003; Yun et al., 2005; Manninen et al., 2006; Grewal et al., 2008; Gupta et al., 2010; Tenhola-Roininen et al., 2011; König et al., 2013, 2014; Afanasenko et al., 2015). Notably, the qPttCLS region of interest spans 51.27–51.77 cM on 3H and QTL previously mapped to this chromosome can be compared using the consensus map provided by Aghnoum et al. (2010). For instance, Yun et al. (2005) reported Rpt-3H-4 on the short arm of chromosome 3H (57.0–66.6 cM) via analysis of the OUH602/Harrington RIL population. Steffenson et al. (1996) reported a QTL in the Steptoe/Morex population (28.7–36.6 cM on 3H), which overlaps with the QTLUHs-3H-2 region conferring seedling resistance to NFNB (34.0– 38.0 cM) identified in detached leaf tests (König et al., 2014). Further, a QTL contributing adult plant resistance, QTLUH-3H (45–51 cM), was reported in the DH population Uschi/HHOR3073 (König et al., 2013). While these previously reported QTL are mapped in close proximity and in some cases overlapping with the region containing qPttCLS, allelism testing is required to precisely determine if the resistance gene is unique or common to these other sources. This future work will involve crossing CLS to these other sources carrying resistance genes mapped to 3H and testing for segregation of resistance in the progeny.

Aligning the DArTseq markers to the barley physical position allowed further analysis of annotated genes. The qPttCLS region harbored genes that are involved in important plant biological, cellular, and molecular functions such as plant disease response, cell death, and signaling pathways. Interestingly, there are still many genes in this region that are uncharacterized, and thus further characterization is important that could potentially identify novel genes. The DArT-seq markers reported in this study will be useful for MAS targeting qPttCLS to develop barley cultivars resistant to NFNB.

### DATA AVAILABILITY

fpls-10-00326 March 25, 2019 Time: 11:50 # 8

The raw data supporting the conclusions of this manuscript will be made available by the authors, without undue reservation, to any qualified researcher.

### AUTHOR CONTRIBUTIONS

ED and LZ analyzed the datasets and wrote the manuscript. LH coordinated data analyses and revised the manuscript. RF revised the data and contributed to manuscript writing. AA, OB, and NL performed the disease screens, DNA extraction, and lab work required for this study. OA designed the experiments and contributed to writing of the manuscript.

### REFERENCES


### FUNDING

This research was supported by the Russian Fund of Basic Research (14-04-00400).

### ACKNOWLEDGMENTS

The authors are grateful to Dr. Gabriel Keeble-Gagnère (Agriculture Victoria) who assisted with aligning the DArTseq markers onto the barley reference genome.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2019.00326/ full#supplementary-material



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Dinglasan, Hickey, Ziems, Fowler, Anisimova, Baranova, Lashina and Afanasenko. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Harnessing Novel Diversity From Landraces to Improve an Elite Barley Variety

Arantxa Monteagudo<sup>1</sup> , Ana M. Casas <sup>1</sup> , Carlos P. Cantalapiedra1† , Bruno Contreras-Moreira1,2†, María Pilar Gracia<sup>1</sup> and Ernesto Igartua<sup>1</sup> \*

<sup>1</sup> Aula Dei Experimental Station (EEAD-CSIC), Zaragoza, Spain, <sup>2</sup> Fundación ARAID, Zaragoza, Spain

Edited by: Dragan Perovic, Julius Kühn-Institut, Germany

#### Reviewed by:

Ravi Koppolu, Leibniz-Institut für Pflanzengenetik und Kulturpflanzenforschung (IPK), Germany Manoj Prasad, National Institute of Plant Genome Research (NIPGR), India

> \*Correspondence: Ernesto Igartua igartua@eead.csic.es

#### †Present Address:

Carlos P. Cantalapiedra, Centro de Biotecnología y Genómica de Plantas UPM–INIA (CBGP), Pozuelo de Alarcón, Spain Bruno Contreras-Moreira, The European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, United Kingdom

#### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 21 November 2018 Accepted: 22 March 2019 Published: 11 April 2019

#### Citation:

Monteagudo A, Casas AM, Cantalapiedra CP, Contreras-Moreira B, Gracia MP and Igartua E (2019) Harnessing Novel Diversity From Landraces to Improve an Elite Barley Variety. Front. Plant Sci. 10:434. doi: 10.3389/fpls.2019.00434 The Spanish Barley Core Collection (SBCC) is a source of genetic variability of potential interest for breeding, particularly for adaptation to Mediterranean environments. Two backcross populations (BC2F5) were developed using the elite cultivar Cierzo as the recurrent parent. The donor parents, namely SBCC042 and SBCC073, were selected from the SBCC lines due to their outstanding yield in drought environments. Flowering time, yield and drought-related traits were evaluated in two field trials in Zaragoza (Spain) during the 2014–15 and 2015–16 seasons and validated in the 2017–18 season. Two hundred sixty-four lines of each population were genotyped with the Barley Illumina iSelect 50k SNP chip. Genetic maps for each population were generated. The map for SBCC042 × Cierzo contains 12,893 SNPs distributed in 9 linkage groups. The map for SBCC073 × Cierzo includes 12,026 SNPs in 7 linkage groups. Both populations shared two QTL hotspots. There are QTLs for flowering time, thousand-kernel weight (TKW), and hectoliter weight on a segment of 23 Mb at ∼515 Mb on chromosome 1H, which encompasses the HvFT3 gene. In both populations, flowering was accelerated by the landrace allele, which also increased the TKW. In the same region, better soil coverage was contributed by SBCC042 but coincident with a lower hectoliter weight. The second large hotspot was on chromosome 6H and contained QTLs with wide intervals for grain yield, plant height and TKW. Landrace alleles contributed to increased plant height and TKW and reduced grain yield. Only SBCC042 contributed favorable alleles for "green area," with three significant QTLs that increased ground coverage after winter, which might be exploited as an adaptive trait of this landrace. Some genes of interest found in or very close to the peaks of the QTLs are highlighted. Strategies to deploy the QTLs found for breeding and pre-breeding are proposed.

Keywords: barley, landrace, QTL, adaptation, 50k

### INTRODUCTION

Increasing crop yields is the main breeding target for cereals. This goal will become increasingly challenging in areas where the occurrence of limiting factors is expected to rise due to climate change, such as Mediterranean Europe. Due to their adaptability to a wide range of conditions, barley landraces are recognized as an important genetic resource with which to search for tolerance to biotic and abiotic stresses (Dawson et al., 2015). However, landrace potential has not been

**99**

fully realized in modern breeding (Fischbeck, 2003; Langridge and Waugh, 2019). This fact was confirmed for wheat in a thorough study of a worldwide landrace collection with high throughput genotyping platforms (Winfield et al., 2018), revealing a substantial amount of novel genetic diversity in the landraces, which is either not captured in current breeding programs or lost due to previous selection pressures. The high diversity found in barley genetic resources predicts a similar situation for this crop (for instance, IBSC, 2012; Russell et al., 2016), or even a more diverse one, as the barley genome has been enriched by protracted gene flow between the cultivated and wild species for thousands of years (Poets et al., 2015).

Barley landraces are valuable resources for breeding in the Mediterranean region (Ceccarelli et al., 1998; Comadran et al., 2009), given their long history of selection under stressful conditions. For instance, landraces out yielded modern varieties in different studies carried out in Syria (Ceccarelli, 1996) and in Spain (Yahiaoui et al., 2014) when grown in harsh to moderate stress conditions. The current study focuses on the Spanish Barley Core Collection (SBCC), which is a powerful tool with which to study and apply the adaptive potential of Spanish landraces to Mediterranean conditions (Igartua et al., 1998). The SBCC accessions contain unique alleles compared to the barley genotypes used in mainstream barley breeding in Europe, particularly in the six-rowed barley pool (Yahiaoui et al., 2008). The accessions also carry adaptations to biotic stresses (Silvar et al., 2010) and to environmental conditions that may be useful in a climate change scenario (Casas et al., 2011). On the negative side, SBCC landraces tend to be tall plants, with late flowering and a risk of lodging (Yahiaoui et al., 2014). In this last study, the landrace-derived lines SBCC073 and SBCC042 were among the top 5% highest-yielding lines in field trials with average yields below 3 t ha−<sup>1</sup> .

In the past, one of the reasons for the limited use of landraces to introduce new genetic variation into breeding programs was linkage drag. Currently, the advent of new platforms of molecular markers provides a solution to overcome this problem (Muñoz-Amatriaín et al., 2014; Russell et al., 2016; Milner et al., 2019), and the use of these platforms is becoming routine in crop breeding programs (Trevaskis, 2018). The new 50k Illumina Infinium iSelect SNP genotyping array (Bayer et al., 2017) will facilitate precise access to the genomic diversity of the landraces and its efficient use in breeding programs.

The most important trait in agriculture is yield, but it is a complex breeding trait due to its low heritability, pleiotropic effects, and susceptibility to genotype-by-environment (G × E) interaction. A large G × E component has hampered breeding progress in the Mediterranean region in the past (Muñoz et al., 1998), and this situation is expected to only intensify in the near future, given the predictions of climate models (Trnka et al., 2011). The strategy for improving crop yield requires selection of its best genetic component, through the contribution of wellknown individual or combined alleles. Some of the combinations that lead to large yield improvements in crops are associated with plant height and flowering time (Cockram et al., 2007; Nadolska-Orczyk et al., 2017). Short cereals exhibited improved grain production and a lower risk of lodging. An optimized flowering time allows plants to benefit from rainfall at early stages and better grain–filling conditions. These features are usually fine-tuned for optimum performance of elite germplasm in each region. A sensible strategy for plant breeding would be to introgress good adaptive features from landraces into elite germplasm developed locally. Judicious selection of parents could lead to candidate cultivars in a rapid manner.

One possible advantage of landraces over modern cultivars in Mediterranean environments is their enhanced early growth vigor. This trait was found in Mediterranean landraces from areas with mild winters (Van Oosterom and Acevedo, 1992) and was identified as one of the factors associated with increased yield under drought (Turner and Nicholas, 1998). A survey of barley varieties obtained over a 100 years of breeding in Nordic countries revealed an overall decrease in early vigor (root and shoot). This decrease was explained by the introduction of semidwarf genes, which increased the harvest index and reduced lodging, and adaptation to agriculture with high fertilizer application (Bertholdsson and Kolodinska-Brantestam, 2009). Early vigor is positively correlated with grain yield and drought tolerance in cereals (Ludwig and Asseng, 2010; Pang et al., 2014). Earlier studies (Van Oosterom and Acevedo, 1992) found that high early vigor was related to good yield in Mediterranean environments, but only in landraces from areas with mild winters, whereas the opposite was true for landraces from areas with colder winters. A delicate balance between water availability, early development and cold tolerance must be achieved to optimize grain yield.

The objectives of this study are: i) to find QTLs for agronomic traits in elite-by-landrace crosses; ii) to evaluate the feasibility of improving an elite cultivar with introgressions from two local landraces, that have shown high performance under low productivity conditions; and iii) to assess the usefulness of RGB imaging during early growth, and its relation to grain yield. The positive alleles contributed by landraces could be directly used in breeding to improve elite cultivars. Positive alleles contributed by the elite cultivar will indicate genomic regions of landraces that could be targeted by pre-breeding programs to improve key landrace features.

### MATERIALS AND METHODS

### Plant Materials

Two barley BC2F<sup>5</sup> populations were developed from crosses between Cierzo and two Spanish landrace-derived inbred lines from the SBCC (Igartua et al., 1998) (((landrace × Cierzo) × Cierzo) × Cierzo). The recurrent parent, Cierzo is an elite sixrowed barley cultivar derived from the cross Orria × Plaisant and selected in Spain. The parent is high-yielding, with an intermediate growth habit and good malting quality, although

**Abbreviations:** FD, flowering date; GA, green area; GGA, greener area; GY, grain yield; HW, hectoliter (or test) weight; PH, plant height; QTL, quantitative trait locus; SBCC, Spanish Barley Core Collection; SNP, single nucleotide polymorphism. SPAD, chlorophyll content measured with soil plant analysis development (SPAD) chlorophyll meter; TKW, thousand-kernel weight.

it is relatively less productive in arid zones.<sup>1</sup> The donor parents were SBCC042 and SBCC073, both of which are six-rowed and high-yielding in low-productivity trials and have an intermediate growth habit (Yahiaoui et al., 2014). After two backcrosses, 270 BC2F<sup>2</sup> lines of each population were selfed for three generations, up to BC2F5. After a few losses, we derived 264 BC2F<sup>5</sup> lines of each advanced cross. The trial was sown following a type II augmented design (Lin and Poushinsky, 1985). The lines were tested in two field trials carried out in the province of Zaragoza in northwestern Spain (41◦ 51′ N, 0◦ 39′ W) in the 2014–2015 and 2015–2016 seasons. The trials were sown in autumn (November 10th, 2014, and November 12th, 2015). The cultivars Cierzo and Orria were used as main checks, with a replicate in each incomplete block of 12 plots, for a total of 28 replicates per check and population. The secondary checks were the donor parent and Plaisant cultivar, randomly repeated in 8 blocks. The plots consisted of four rows that were 3.0 m long × 0.8 m wide. Climatic data were provided by the Spanish Meteorology State Agency (AEMET) and were gathered from a station in the same location as the trials (Zuera) (**Figure S1**).

A sample of 96 lines of the population SBCC073 × Cierzo was trialed again in the field in the 2017–2018 season at the same location, using the same plot size, two replicates, and a randomized complete block design. These lines were selected for homogeneous flowering dates; therefore, the earliest and latest lines were culled. The aboveground biomass of plants in two contiguous rows (25 cm per row, 50 cm in total, 0.10 m<sup>2</sup> ) of a representative zone of the plot was hand-harvested at ground level, and used to estimate yield components and the harvest index. This trial was used to validate the QTLs found in the previous seasons.

### Phenotyping

Plots were scored for grain yield (GY), plant height (PH), flowering date (FD), thousand- kernel weight (TKW), hectoliter weight (HW), crop cover as green area (GA) or greener area (GGA) and SPAD (chlorophyll content measured with soil plant analysis development) (**Table S1**). The plots were combineharvested, and grain yield was converted to kg ha−<sup>1</sup> , taking into account the harvested area per plot. Plant height was measured in cm from the soil to the base of the spike at maturity in one representative plant per plot. Flowering time was recorded as the number of days from January 1st until the date when 50% of the stems of each plot displayed 2-cm protruding awns (stage 49, Zadoks scale, Zadoks et al., 1974). Hectoliter weight (kg hl−<sup>1</sup> ) was estimated with a grain analyzer model GAC- II (Dickey-John, USA) by measuring the weight of a constant volume. Thousandkernel weight (g) was calculated from the weight of a 1000-grain sample. Green and greener areas were measured with zenithal pictures of each plot and analyzed using Breedpix software (Casadesús and Villegas, 2014). These indexes are estimates of the ground cover at the end of the vegetative period and of the early vigor of the lines, according to crop development on the dates when the photos were taken (March 13th, 2015 and February 17th, 2016). A single digital picture was taken with a Nikon Coolpix B700 camera held 145–150 cm above the ground and in front of the sun to avoid shading. Each picture contained the four rows of a single plot, focusing on the center of the plot. The zoom was set to an 8 mm focal length with a semiautomatic aperture, prioritizing the shutter speed, which was adjusted to 1/125 s. These parameters help avoid problems caused by wind and hand movements. SPAD color measurements were taken during May 10–11th, 2016. Ten measurements per plot were taken from the flag leaves of 10 randomly chosen plants (2–3 per row) with a SPAD chlorophyll meter (SPAD-502, Minolta, Japan).

Soil variation was observed in the two dimensions of the trial. Therefore, to minimize error due to autocorrelation among adjacent plots, raw data were spatially corrected in both directions using a moving average correction approach in R (R Core Team, 2014) with the mvngGrAd R package (Technow, 2011), as in Nice et al. (2017). This procedure performs a correction similar to the augmented design, optimizing the grid size and shape used for adjustment in two dimensions and searching the moving average grid by minimizing the variance in the primary and secondary checks. To validate the procedure, two calculations were performed: (1) the correlation between the 2 years and (2) Pearson's coefficient of variation (CV) between the testers for each year. If the correlation between the adjusted values was greater than the correlation between the observed values and the adjusted values' CV was lower than the observed values' CV, we considered the data to be well-adjusted.

Principal component analyses (PCAs) were performed with the function PCA in the R package FactoMineR 1.41 (Lê et al., 2008). Correlation analyses were performed using the R package corrplot 0.84 (Wei and Simko, 2017).

### Genotypic Analysis and Map Construction

Genomic DNA was obtained from one leaf per genotype of 10 days-old plants using a NucleoSpin <sup>R</sup> Plant II kit (Macherey-Nagel, Germany). DNA concentration was quantified using a Nanodrop 2000 (Thermo Scientific, USA).

A total of twelve 48-well plates of the Barley 50k iSelect SNP Array (Bayer et al., 2017) were processed by the CEGEN service, Centro Nacional de Investigaciones Oncológicas (CNIO, Spain). This chip scores 44,040 SNPs. The parents, an artificial F<sup>1</sup> (DNA mixture of the two parents), and the cultivar Morex were included in each plate as controls. SNP alleles were called using GenomeStudio Genotyping Module v2.0.2 (Illumina, USA). Calling was manually curated as recommended by Bayer et al. (2017). At any single marker, the average theoretical homozygosity should be 98.44%. Therefore, true heterozygotes should appear at a frequency of 1.56%. Given the type of population, the expected segregation was 7:1 (7 Cierzo alleles per allele from SBCC042 or SBCC073). Markers with too many missing data (call frequency < 0.7) and excess of heterozygotes (Het\_Excess\_Freq > −0.6) were filtered out, as were monomorphic markers. The resulting data were loaded into Flapjack (Milne et al., 2010) for visual inspection of graphical genotypes. Segregating SNPs were grouped into linkage groups with a LOD score of 8 with JoinMap 4.0 (Van Ooijen, 2006). Cosegregating SNPs were excluded to increase computing

<sup>1</sup>http://www.genvce.org/variedades/cebada/invierno/cierzo/ (accessed February 11, 2019).

efficiency. The maximum likelihood algorithm was used to estimate the best order of markers within each linkage group. The final map order was computed in the R/qtl package (Arends et al., 2010). The distance between the markers was calculated based on Kosambi's mapping function using the Viterbi's algorithm with the function quickest in ASMAP (Taylor and Butler, 2016). The resulting maps were matched to the position in the reference genome (Mascher et al., 2017) using compareOrder, aiming to maintain high LOD scores while keeping the map length to a minimum. Segregation distortion was calculated with the geno.table function (R/qtl). Then, cosegregating SNPs were included again in the dataset.

### QTL Analysis

QTL analyses were performed in GenStat 18 (Payne et al., 2011) using the single-trait linkage analysis procedure (single or multiple environments, as appropriate). Genetic predictors were estimated every 2 cM. In the first step, simple interval mapping (SIM) was run. Then, detected QTL were used as cofactors to carry out the second step, running a composite interval mapping (CIM). Following CIM, rounds were run until a stable solution was found. The Li and Ji (2005) method was used to estimate the significance threshold of –log10P to declare putative QTLs with an overall significance level of 0.05. The minimum distance between cofactors was established as 15 cM, and the minimum distance between QTLs was set to 10 cM. In the last step, the final QTL model was built. Confidence intervals (95% Bayes credible intervals) for the QTLs were calculated in R/qtl using function Bayesint. Genomic regions surrounding the QTL were explored, taking into account the genes between the markers cosegregating at each peak, according to positions reported by BARLEYMAP (Cantalapiedra et al., 2015); then, all genes within the confidence intervals were retrieved. Possible interactions between QTLs were examined with pairwise analyses of variance in the marker peaks for each trait. Venn diagrams were computed and plotted with R, using the package "VennDiagram" (Chen and Boutros, 2011). MapChart (Voorrips, 2002) was used to represent genetic maps and QTL positions. Physical maps combining information obtained for both populations were constructed with the R package "ggplot2" (Wickham, 2016).

### RESULTS

Spatial corrections were based on finding the optimum surrounding grid per trait, as in Nice et al. (2017), which resulted in 4 plots on each side in the horizontal direction and 2 plots per side in the vertical direction. Each plot was adjusted according to a grid of 24 neighboring plots (8 in the plot's row and 5 and 3 in the row above and below) and a minimum of 9 plots at the corners of the experiment.

The three seasons were different regarding yield levels. The first season, 2015, was a rather typical year, with terminal drought and yields of ∼3 t ha−<sup>1</sup> . In the second season, however, the yields ranged between 5 and 6 t ha−<sup>1</sup> for most accessions, remarkably high production for the test site. The validation season was similar to 2015–16, although with lower overall yields, despite abundant precipitation in spring (**Figure S1**). As expected, the populations' averages were, in general, intermediate between the parents, but much closer to the elite parent, as it contributed 87.5% of the genome of each population (**Tables 1**, **2**). On average, the landrace parents yielded 20% less than Cierzo, the elite parent, whereas the populations yielded only 4% less. There were remarkable disparities among years for these differences. In the less productive year, 2015, the yield of both landrace parents and the population averages were more distant from the yield of the elite parent than in 2016 (**Figure 1**). There was a large difference between Cierzo and the landrace parents in PH (15–22 cm), TKW (1.5–9 g) and HW (7–8 kg hl−<sup>1</sup> ). For these traits, the population means fell between the parental values but closer to that of the elite parent. All parents exhibited similar flowering dates, with a maximum of 3 days between extremes in any year. For SBCC042 × Cierzo, the flowering dates for the parents and the population mean were very similar. For SBCC073 × Cierzo, the landrace parent flowered slightly later than Cierzo. The population mean was closer to the value of the landrace parent and even outside the interparental range in 2016. Regarding ground cover during vegetative growth, the parents exhibited a contrasting pattern among years. Both landraces displayed values lower than that of Cierzo in 2015 and higher than that in Cierzo in 2016 (although at an earlier stage), with all but one population at intermediate positions. For SBCC042 × Cierzo, the population means were always within the interparental range. In contrast, in SBCC073 × Cierzo, some traits exhibited population means outside the interparental range. This discrepancy cannot be explained by a purely additive model. There must have been some epistatic genes whose effects were distorted in the crossing. Such a pattern occurred for the SPAD score and the GA in 2016. In these cases, the population mean was significantly different from the values of both parents. Interestingly, the average GY of the population in 2016 was also higher than both parents, although not significantly. The population means were based on measurements from 200 plots, while the elite parent mean was the average from 28 plots and the landrace parent mean was from 8 plots per year. Therefore, the estimates of these values were quite robust.

Cierzo exhibited agronomic advantages in terms of PH, HW and TKW. We expected the landraces, and their populations to perform better than the elite parent in the drier, less productive year, but we found the opposite pattern. We must highlight that Cierzo, a successful cultivar in recent years, was bred in Spain, and the location used for the current experiment was among those used to select this cultivar. Therefore, we tested some of the best possible landraces against one of the best modern cultivars for the region. The landraces and the elite line presented remarkable differences in GY, PH, TKW and HW. Cierzo out yielded the landraces in both years, having a greater HW and a shorter stature (**Tables 1**, **2**). The landraces, however, presented a much higher TKW than the elite parent (**Tables 1**, **2**). Differences between genotypes were significant for GY, PH, and FD in both populations, as shown by the ANOVA results (**Table S2**), which were calculated with the residuals provided by the replicated checks. Both populations showed normal distributions and transgressive segregation for most traits, with some exceptions, such as HW, whose distribution was



#The difference between means followed by the same letter is not significantly larger than the LSD (P < 0.05).

GA, green area; GGA, greener area; TKW, thousand-kernel weight; HW, hectoliter weight.


#The difference between means followed by the same letter is not significantly larger than the LSD (P < 0.05).

GA, green area; GGA, greener area; TKW, thousand-kernel weight; HW, hectoliter weight.

nearly bimodal (**Figures S2, S3**). Among the ground cover traits, namely, GA and GGA, both parents showed similar covertures in March 2015, and SBCC042 showed more coverture than Cierzo in February 2016 (**Figure S3a**; **Table 1**). Few differences were found between SBCC073 and Cierzo in both traits and in SPAD measurements (**Figure S3b**; **Table 2**).

### Correlations Between Traits

Negative correlations were observed between GY and FD; in other words, later flowering was associated with decreased yield (**Figures S4**–**S8**). Water stress during grain filling is a common occurrence in Mediterranean climates; therefore, this correlation was not unexpected. We observed positive correlations between GY and PH in most cases (SBCC073 × Cierzo in 2015 was the exception). Overall, these two traits showed moderate and positive correlations with ground cover in both populations and years, particularly in SBCC042 × Cierzo. The positive correlation between PH and GY is problematic from a plant breeding point of view. Spanish barley landraces are usually taller than modern cultivars, and one of the reasons for the replacement of the former was their susceptibility to lodging in current high-input agriculture (Yahiaoui et al., 2014). The most straightforward way to convert the landraces into modern feed cultivars would be to reduce their height without incurring a yield penalty. This goal

is probably unachievable with these biparental crosses. Height reduction through marker-assisted selection, and possibly even through genomic selection, would not be enough to achieve the same height as that of the elite parent, given the antagonistic correlation between yield and height, which is at least partially caused by the QTL in 6H shared by the two populations.

other averages and LSDs were calculated relative to Cierzo.

Both populations demonstrated a significant contribution of GY to the first dimension of the biplots based on correlations (**Figure S9**), which was almost orthogonal with PH in 2015. In contrast, both parameters were strongly correlated in 2016, possibly because of the good conditions experienced in that year. Flowering contributed substantially to the second dimension in 2015 and was responsible of the major differences in dimension 1 in 2016, but with a negative sign. This result can also be explained by the different climatic conditions experienced in both years. While 2015 was a normal-to-low year in terms of spring rainfalls and temperatures, 2016 was an excellent year with a very humid and mild spring. In all cases, coverture at early stages significantly contributed to the first dimension. GA at early stages (2016) or advanced vegetative growth (2015) was related to early flowering and taller plants. HW and SPAD (only recorded in 2016) were highly correlated with late flowering that year, and a higher TKW was obtained in lines with more coverture at early stages in SBCC073 × Cierzo, with early flowering in both cases.

### Genetic Map

Among the 44,040 markers, 12,893 and 12,026 SNPs were polymorphic and of high quality in SBCC042 × Cierzo and SBCC073 × Cierzo, covering total distances of 1,080.6 and 1,115.8 cM, respectively (**Table S3** and **Supplementary Data 1**). Landrace alleles provided good coverage of the whole genome (graphical genotypes, **Figure S10**). After quality filtering, 206 and 241 lines were kept for the QTL analysis, for SBCC042 × Cierzo and SBCC073 × Cierzo, respectively. In all instances in which we refer to the two populations, SBCC042 × Cierzo will be reported first and SBCC073 × Cierzo last. In these populations, 987 and 875 markers corresponded to unique genetic positions (one marker in one position among all the cosegregating markers) in the two maps, respectively. Nine and seven linkage groups were identified in the populations. In SBCC042 × Cierzo, the 9 linkage groups represented 5 complete chromosomes (1H, 3H, 4H, 5H, and 7H) and 2 fragmented ones (2 groups each for 2H and 6H). One linkage group per chromosome was found in SBCC073 × Cierzo. The number of markers per chromosome ranged between 1,290 (1H) and 2,275 (5H) and between 1,273 (1H) and 2,087 (5H) in each population (**Table S3**). The average spacing between markers for each map was 1.0 and 1.3 cM, with maximum spacings of 18.2 and 15.1 cM (**Table S3**). In these BC2F<sup>5</sup> populations, the expected percentages of allelic frequencies of elite and landrace parents were ∼87.5:12.5. The actual frequencies were 83.7 AA: 2.1 AB: 14.2 BB and 86.2 AA: 1.7 AB: 12.1 BB (**Figure S11**). The populations shared 8,036 polymorphic markers (**Figure S12a**), which were mainly distributed in the distal and interstitial regions of the chromosomes (**Figure S12b**), except on chromosome 7H, where they also exhibited proximal and centromeric positions. Among the rest of the markers (4,857 in SBCC042 × Cierzo and 3,990 in SBCC073 × Cierzo), chromosome 2H showed specific markers for the SBCC073 × Cierzo population and distal part of the short arm in 5H showed a concentration of markers for SBCC042 × Cierzo. This last region showed some segregation distortion, with overrepresentation of Cierzo alleles in SBCC042 × Cierzo (**Figure S11**).

### QTL Analysis

We detected 21 significant QTLs in SBCC042 × Cierzo and 23 in SBCC073 × Cierzo (**Tables 3**, **4**). Trait-increasing alleles for GY (**Figure 2**) were only contributed by the elite parent, whereas PH (**Figure 2**) and TKW (**Figure S13**) were mainly, but not only, contributed by the landraces. There were many FD QTLs (**Figure 2**), with trait-increasing alleles contributed by the two parents in both crosses. Ground cover QTL traits were detected only in SBCC042 × Cierzo, with trait-increasing alleles contributed mainly by the landrace (**Figure S14**). The QTL confidence intervals were cross-referenced to the barley reference sequence and common markers were projected based on other published studies when possible (**Figure 3**). We identified 5 chromosomal regions with QTLs for one or more phenotypic traits shared across populations and/or years (**Figures 3**, **S15, S16**):


In addition to these QTLs, we found an FD QTL on the short arm of 7H in both populations. However, only the QTL in SBCC073 × Cierzo seemed to be near the wellknown HvFT1 gene. Flowering was accelerated by the elite cultivar allele in both populations. For early vigor (measured as GA and GGA), only SBCC042 contributed trait-increasing alleles (GA\_Feb and GGA\_Feb in 2016) with three significant QTLs (1H, 75.5 cM; 5H, 89.5 cM; and 6Hb, 43.0 cM). These findings indicate an increase in ground coverage after winter, which might be exploited as an adaptive mechanism of this Spanish landrace.

Interactions between QTLs were tested for all possible pairs of markers representing QTL peaks for each trait. We did not expect to find highly significant correlations, as this kind of population presents such imbalanced genotypic frequencies. We considered interactions that had a minimum number of 5 lines in each of the four possible genotypic classes (considering only homozygotes at each pair of genes). There were two significant interactions, one for hectoliter weight, i.e., QHW.73 × C.1.1 × QHW.73 × C.4.2 (P = 0.030, **Figure S17a**), and one for plant height, i.e., QPH.73 × C.4.1 × QPH.73 × C.7.4 (P = 0.011, **Figure S17b**).

### Validation of SBCC073 × Cierzo QTLs

The field trial carried out with 96 lines of this population confirmed the GY QTL, 3 of the 4 PH QTLs and 3 of the 5 TKW QTLs. None of the FD QTLs were confirmed. This result was expected, as this subpopulation was chosen based on its homogeneous FD, and the extremes of the FD distribution were discarded. The estimation of yield components and the harvest index allowed investigation of the causes underlying the grain yield QTL found on 6H. Biomass production did not differ between the alleles (83.72 vs. 83.84 g) or the number of tillers and spikes per unit area. Differences in GY resulted from a large divergence in the harvest index, which, was in turn mainly caused by the significantly larger number of grains (14%) exhibited by lines with the Cierzo allele (**Table 5**).

### DISCUSSION

### Can We Identify Candidate Loci Underlying QTL Using the 50k Chip and Two Large Populations?

Identifying candidate genes with the sizes and types of populations and density of markers used here is possible only for loci with very large phenotypic effects and little genotypeby-environment interaction, and we did not find any such loci. However, there is a lot to gain from the use of the 50k chip and the current reference barley genome (Mascher et al., 2017) from the point of view of barley breeding. Good marker coverage combined with a large population size narrowed down the confidence intervals for many QTLs, reducing the list of potential candidates (**Supplementary Data 2**). These shortlists are trustable resources for further research because there is high confidence in the physical position of the genes. QTL flanking markers from the literature were cross-referenced with ours, as the use of a reference genome helps to confirm or reject commonalities among findings. In some cases, we were able to shed new light on candidates proposed in previous studies.

The use of the reference genome has some small caveats, such as the presence of unexpected duplications in the genome, which complicate map construction. For instance, locus HvFT1 appears on both chromosomes 3H (HORVU3Hr1G087100) and 7H (HORVU7Hr1G024610). We placed HvFT1 only on 7H in our maps based on a linkage map (BOPA2\_12\_30893) and the abundant literature that reports the locus being located on this chromosome instead of 3H. Some duplications may be real, but geneticists should be aware of this fact and apply their expertise and previous knowledge in each case.

Below, we report insights on the QTLs found, i.e., matches with previous studies, and information on genes of interest in the QTL regions in chromosome order. We do not intend to declare candidate genes. Rather, we combine the information with possible biological meaning that we found after crossreferencing functional and positional information, aiming to pose meaningful questions:


TABLE

3


 intervals for all QTLs found in the two barley BC2F5

populations

 in 2015 and 2016.

(Continued)

**106**


 by environment interaction (QTL × E) it is indicated. Confidence intervals are given in genetic and physical positions. #Confidence

intervalintervals.

 in the physical map (Mascher et al., 2017) based on Bayes credible


TABLE 4 | Percentage of variance explained for and additive effect (effect of replacing an allele from parent 1, Cierzo, with an allele from parent 2, landrace) of the QTLs found in the two barley BC2F5 populations in 2015 and 2016.


\*, \*\*, \*\*\* Significant effect at P < 0.05, P < 0.01, and P < 0.05, respectively (calculated only for traits measured in two trials); ns, non-significant effect.

The flowering date QTLs found at ∼30–40 cM in chromosome 1H on both populations (QFD.73 × C.1.1 and QFD.42 × C.1.1) have overlapping confidence intervals. However, the intervals

TABLE

3


Continued

are large, and the peaks fall on different arms for the two populations (1HS and 1HL, respectively). The peak marker for the first QTL falls within a gene annotated as MYB DOMAIN PROTEIN 87 and is close to a flowering date QTL described in Wonneberger et al. (2017). For SBCC073 × Cierzo, the interval contains a CYTOCHROME P450 SUPERFAMILY PROTEIN (HORVU1Hr1G009110), HORVU1Hr1G010780 (SENSITIVITY TO RED LIGHT REDUCED PROTEIN, SRR1), which is polymorphic in both populations, and HORVU1Hr1G011800, which is HvTOE1, an ortholog of TaTOE-B1 (Zikhali et al., 2017). This last gene is an AP2-LIKE ETHYLENE-RESPONSIVE TRANSCRIPTION FACTOR, located at 36.02 cM, coincident with a QTL for time to awn tipping reported by Alqudah et al. (2014) that was polymorphic only in the second population. TOE proteins in Arabidopsis"convey a photoperiodic signal to antagonize CONSTANS and regulate flowering time" (Zhang et al., 2015).

The QTL hotspot at 503–526 Mb on 1H can actually be split into two spots based on the overlap of confidence intervals and the underlying genes. The first spot, to the left of the region, harbors the QTLs for FD and TKW in both populations and the GA QTL in SBCC042 × Cierzo. The second spot, to the right, encompasses the HW QTL for both populations. The QTLs for FD and TKW are located between 509 and 521 Mb. Both landraces have an active HvFT3 allele, which Cierzo lacks. This gene has a large effect on FD in Mediterranean conditions (Boyd et al., 2003; Cuesta-Marcos et al., 2008; Tondelli et al., 2014) and could be responsible for the detected QTLs with earlier landrace alleles. However, the confidence interval in SBCC042 × Cierzo is slightly shifted to the right of HvFT3. Another gene with an effect on FD in plants is HORVU1Hr1G076800 (DOF ZINC FINGER PROTEIN 2), located at 515.7 Mb, 75 cM in this population. Genes of this family of transcription factors repress CONSTANS in Arabidopsis, delaying flowering (Fornara et al., 2009). The effect on TKW (increased by landrace alleles) could be pleiotropic or could be due to another gene. Other studies revealed QTLs for TKW (Haseneyer et al., 2010; Locatelli et al., 2013) and arabinoxylan content (Hassan et al., 2017) in this same region. Neither of the studies reported any association with earliness; thus, the authors discarded eam8 (ELF3, Faure et al., 2012; Zakhrabekova et al., 2012) as a candidate. Pauli et al. (2014) found a novel QTL for test weight in this region, which they reported as different from the QTL in the region of PpdH2. However, the projection of their confidence interval onto the physical map indicates that the QTL found by these authors corresponds to the region we identified near HvFT3. Even in such a small region, several genes could be candidates for this effect. The genes HEXOKINASE 1 and TREHALOSE PHOSPHATE SYNTHASE are located in the confidence intervals (511 and 514 Mb) and were found to play a role in repressing and/or redirecting sucrose utilization in barley caryopses during heat stress exposure (Mangelsen et al., 2011). In fact, markers for this last gene are polymorphic in both populations.

The bimodal distribution of phenotypic frequencies for HW (**Figure S3**) hints at the possibility of a major gene segregating for this trait in both populations for QHW.42 × C.1.1 and QHW.73 × C.1.1. These QTLs, between 519 and 526 Mb, present larger values for the Cierzo allele in both populations. Studies crossing wild and cultivated barley have found QTLs in the same region. Nice et al. (2017) found an HW (also named test weight) QTL, and Sharma et al. (2018) found a TKW QTL, with the wild allele contributing lower values. Studies with cultivated barley also found QTLs for HW in the same area (Rode et al., 2012; Wang et al., 2012; Mansour et al., 2014). Interestingly, all these QTLs collocate with the threshability locus thresh-1, identified in another wild-by-cultivated cross (Schmalenbach et al., 2011), although we can discard the candidates they proposed because the candidates fall outside our confidence intervals.

QFD.73 × C.2.3 lies just 7 Mb from HvFT4, although this gene is not represented in the 50k chip. Two more genes of the CYTOCHROME P450 SUPERFAMILY (HORVU2Hr1G025160 and HORVU2Hr1G025480) are inside the confidence interval. Farther down chromosome 2H, the QFD.42 × C.2b.3 peak is just 1 Mb away from HvARF9, an AUXIN RESPONSE FACTOR.

At QPH.42 × C.3.1, the Cierzo allele contributes increased plant height. QTL for plant height in this region were previously reported by Haseneyer et al. (2010) and by Rode et al. (2012), although marker comparison confirmed the colocation of QTLs for only the second QTL.

QFD.42 × C.3.4 falls in the same region as a flowering time QTL found in the population Arta × Keel (Rollins et al., 2013), in which the Arta allele conferred lateness, as did SBCC042. Interestingly, Arta is a Mediterranean landrace from the Middle East. Lakew et al. (2013) also detected a flowering time QTL in the same region. We can discard the denso/sdw1 region, at ∼630 Mb, as responsible for our QTL. Our confidence interval (682–698 Mb) actually includes the region of the earliness gene eam10 (HvLUX, 692 Mb, Campoli et al., 2013).

At the plant height QTL QPH.73 × C.4.1, the Cierzo allele also increases plant height. Maurer et al. (2016) found a PH QTL, with a wild barley allele contributing increased height, and Mansour et al. (2014) found another PH QTL in the population between Cierzo parents. In the QTL peak we found an ACETYL ESTERASE, corresponding to a gibberellin (GA) receptor (GID1L2-8, GA-Insensitive Dwarf 1, Hill et al., 2018).

The possibly common FD QTL found at both populations on 4H is located close to VrnH2. However, all parents have an active VrnH2 allele (all three ZCCT-H genes); therefore, it is unlikely to be the causal gene. Fisk et al. (2013) and Rollins et al. (2013) found an FD QTL in this location, but in both cases, they were consistent with a VrnH2 effect. However, a possible causal effect of VrnH2 cannot be ruled out without further evidence. Another gene present in the vicinity of this QTL is HORVU4Hr1G088850, which codes for a PROTEIN CHAPERONE-LIKE PROTEIN OF POR1, that is essential for chloroplast development (Lee et al., 2013).

The peak of QPH.73 × C.5.2 falls exactly on HORVU5Hr1G0000010, SUCROSE TRANSPORTER 4. Wonneberger et al. (2017)reported a PH QTL in the same region.

QTKW.73 × C.5.4 (with SBCC073 contributing a larger grain weight) is in the same region as QTLs reported by Pauli et al. (2014) for grain plumpness, and by Mohammadi et al. (2015) for TKW.

The QTL hotspot on 5H between 561 and 577 Mb holds QTLs for FD in both populations and for PH and GA in SBCC042 × Cierzo, in all cases with landrace trait-increasing alleles. It is interesting that TWO-COMPONENT RESPONSE-REGULATOR-LIKE PRR95, a circadian clock gene (Campoli et al., 2012; Calixto et al., 2015) is close to the peak for the FD QTL. Although the gene is located slightly outside the confidence interval in SBCC042 × Cierzo and thus should not be highlighted as a possible candidate, we found some inconsistencies between our maps and the reference genome in the region and thus cannot discard HvPRR95.

In SBCC042 × Cierzo, the landrace allele also contributes a larger GA, which could be related to the more vigorous early shoot growth exhibited by SBCC042 compared to that in either Cierzo or SBCC073 (**Table 1**). Ceccarelli et al. (1991) reported that early vigor leads to adaptation to marginal environments. Early vigor could also be related to differential responses to frost damage, although we think this relationship is unlikely in our case, as no frost damage was detected in the trials.

From an agronomic point of view, the most important QTL hotspot found in our study is the one on 6H. As mentioned before, this region is very wide. It spans both chromosome arms, although all QTL peaks lie on the short arm. The marker on the peak for QPH.42 × C.6b.3 cosegregates with HORVU6Hr1G020330, which corresponds to AUXIN RESPONSE FACTOR 19 (Tombuloglu, 2018), and there is another auxin response gene very close to the peak in the first population, HORVU6Hr1G021040, which corresponds to AUXIN SIGNALING F-BOX 3. The finding of this QTL in both populations could be due to the excessive plant height of landraces used for modern breeding (Yahiaoui et al., 2014). Mansour et al. (2014) also found a QTL for PH in the same region (325 Mb) in the population Orria × Plaisant, with the Orria allele reducing plant height. Other studies have found PH QTLs on 6H with overlapping regions in very different germplasm sets, such as Alqudah et al. (2016) in a GWAS study, Maurer et al. (2016) in a wild-by-cultivated cross, and Teulat et al. (2001) and von Korff et al. (2008) in Mediterranean barleys. Baum

et al. (2003) also found a PH QTL on 6H. It is not possible to conclude that all these studies identified one or several underlying genes, but our findings on this 6H region confirm its key role in barley breeding.

Regarding grain yield and yield components at the 6H hotspot, the QTL for GY in both populations and the TKW QTL on SBCC073 × Cierzo show close peak positions (115–119 Mb), suggesting a unique QTL. Sharma et al. (2018) and Nice et al. (2017) also found GY QTLs in the vicinity (114 Mb and 140 Mb, respectively), with the yield-decreasing allele contributed by wild parents. In contrast, Locatelli et al. (2013) found a QTL for spikes per square meter (423 Mb). The validation trial suggested a major role of the harvest index, through both grain number per spike and TKW. Inside the confidence intervals and very close to the GY peaks (114–117 Mb), there are several interesting genes that are polymorphic in both populations: HORVU6Hr1G029150 (UBIQUITIN CARBOXYL-TERMINAL HYDROLASE 2), which affects shoot architecture in Arabidopsis (Yang et al., 2007); HORVU6Hr1G029160 (PROTEIN PHOSPHATASE 2C FAMILY PROTEIN), a key player in signal transduction (Rodriguez, 1998), which is polymorphic in the first population; HORVU6Hr1G028680, a TWO-COMPONENT RESPONSE REGULATOR ARR12 (Moubayidin et al., 2010); and HORVU6Hr1G028710, a GLUCOSE-6-PHOSPHATE DEHYDROGENASE 2. A locus with a large effect on grain weight on chromosome 6 has been detected in syntenic regions of other cereals. OsGW2 in rice (Song et al., 2007) and its homolog in wheat, TaGW2 in the A genome are associated with



\*\*, significant at P < 0.01; ns, non-significant.

TKW (Su et al., 2011) and grain yield (Simmonds et al., 2014). However, its apparent homolog in barley, HORVU6Hr1G044080, annotated as PROTEIN SIP5 (251 Mb), was included in our GY QTL confidence intervals but was far from the peaks and not located in the TKW one. It seems unlikely that the QTLs detected in our study are caused by the barley GW2 homolog.

There was another TKW QTL on 6H, but only in SBCC073 × Cierzo. The peak falls on the gene HORVU6Hr1G076810, PYRUVATE DEHYDROGENASE E1 COMPONENT SUBUNIT ALPHA, and also includes the gene HORVU6Hr1G076760, ARM REPEAT SUPERFAMILY PROTEIN (Mudgil et al., 2004). Based on position, we can discard coincidence with HvNAM-1 (grain protein content), HORVU6Hr1G019380 (Distelfeld et al., 2008), which is associated with variation in TKW in population HEB-25 (Maurer et al., 2016).

Finally, on 6H, there is a GA QTL in SBCC042 × Cierzo. GA/GGA in SBCC042 × Cierzo is associated with increased green area conferred by the SBCC042 allele. Rollins et al. (2013) found a growth vigor QTL in Arta × Keel, with a trait-increasing allele from the landrace Arta, whose interval did not encompass ours. Ingvordsen et al. (2015) also found a nearby QTL for aboveground biomass. The gene HORVU6Hr1G080340, ETHYLENE-RESPONSIVE TRANSCRIPTION FACTOR 5, is located within our confidence interval.

The two FD QTLs found on the short arm of 7H occur at close but non-overlapping positions. The well-known gene HvFT1, which has a large effect on FD, falls between the two QTLs (40 Mb) and outside their confidence intervals and therefore cannot be considered a candidate. In both populations, Cierzo contributes the early allele. Coincident with QFD.42 × C.7.7, Nice et al. (2017) reported an FD QTL in an AB-NAM population (HORVU7Hr1G022550, at 33 Mb). Fisk et al. (2013) and Alqudah et al. (2014) also found several QTLs related to flowering time or phasal growth duration coincident with this region but, based on the location of their associated markers on the current reference genome, they seem distal to HvFT1. The gene HORVU7Hr1G026840, HISTONE-LYSINE N-METHYLTRANSFERASE E(Z), which participates in histone methylation is located very close to the QFD.73 × C.7.7 peak. An FD QTL in the HEB-25 population, with the later allele coming from the wild barleys, was reported in the same region (Maurer et al., 2015, 2016), and another FD QTL in cultivated barleys was reported by Fisk et al. (2013). The fact that we found two FD QTLs on the short arm of 7H that were close to but apparently different from HvFT1 questions the nature of the genes underlying flowering time QTLs found in a large number of studies in this region.

QPH.73 × C.7.4 and QHW.73 × C.7.3 show distant peaks, but their wide confidence intervals greatly overlap. The landrace allele increases plant height and reduces hectoliter weight. Regarding HW, the test weight QTL found by Nice et al. (2017) is located outside our confidence interval. Previously, PH QTLs were found in Orria × Plaisant (Mansour et al., 2014), and in a GWAS study with Jordanian landraces (Al-Abdallat et al., 2017). The first one falls within our confidence interval. At the peak for PH, we found the genes HORVU7Hr1G056590 (HYDROXYMETHYLGLUTARYL-COA SYNTHASE), HORVU7Hr1G056630 (RHOMBOID-LIKE PROTEIN 3), HORVU7Hr1G056980 (6-PHOSPHOGLUCONOLACTONASE 5) and HORVU7Hr1G057100 (CYTOCHROME P450 SUPERFAMILY PROTEIN), which could be related to the trait. In addition, two genes are present in the region of the QTL peak, namely, BASIC LEUCINE ZIPPER (bZIP) TRANSCRIPTION FACTOR FAMILY PROTEIN HvIRO1 (AB199587.1 HORVU7Hr1G056490) and the gibberellin receptor GID1 (GA-insensitive dwarf phenotype, HORVU7Hr1G057260, not present in the 50k chip). Both are related to the "dwarf " phenotype (Hartweck and Olszewski, 2006; Ogo et al., 2006). The presence of epistasis of this QTL with QPH.73 × C.4.1, which is related to another GA-receptor (GID1L2-8, Hill et al., 2018), leads us to think that the interactions of genes related to the GA pathway are involved in yield-related traits in this population.

At QHW.42 × C.7.2, the Cierzo allele increases HW. The peak falls within the gene HvBRD2/HvDIM, DELTA(24)-STEROL REDUCTASE, HORVU7Hr1G120030, with putative effect on plant height and tillering formation (Alqudah et al., 2016). The region also coincides with a beta-glucan-content QTL found by Houston et al. (2014).

### Can Spanish Landraces Contribute to Improving Elite Local Cultivars?

Elite cereal cultivars are derived from a relatively narrow germplasm pool and are predominantly well-adapted to high-input agriculture (Newton et al., 2010). This study relies on the assumption that local landraces can contribute adaptive features that may not have been fully incorporated into current elite varieties and that may be particularly relevant as sources of novel "abiotic stress resistance genes or combinations of genes if deployed appropriately" (Newton et al., 2010). In this study, we aimed to detect some genetic factors responsible for the good performance and adaptation of top Spanish barley landraces under Mediterranean conditions, and to transfer them into an elite cultivar.

The two contrasting seasons, in terms of both vegetative development (measured as plant height) and grain yield, provide an opportunity to compare performance under high-yielding conditions (2016) vs. typical stress conditions occurring late in the season (2015).

The elite parent, Cierzo, was used as a yardstick to measure the potential of these two superior landraces to contribute to barley breeding. Unfortunately, we did not detect direct grain yield QTLs with the superior allele coming from the landraces, even in the lower-yielding year. There were QTLs for TKW, HW and FD with trait-increasing alleles from both sides of each cross, indicating that breeding is possible in either direction of these crosses for each of these traits. QTLs for GA (and GGA) were detected only in SBCC042 × Cierzo, although they were moderately and positively correlated with grain yield in both years and in both populations. This finding is consistent with the much higher ground cover shown by SBCC042 than by Cierzo in 2016. Thus, at least under the two contrasting conditions experienced in the two seasons, and for both populations, more profuse development during the vegetative period can be considered a positive feature. This trait is easy to score and to select for, and these results hold promise for further use of this trait in barley breeding. However, the positive correlation between this trait and plant height can be a problem, particularly in SBCC042 × Cierzo, as selection for ground cover could result in indirect selection for taller plants.

The landraces can introduce excessive plant height and low hectoliter weight. We identified some QTLs that could be introgressed together with other desirable alleles to counterbalance these potential negative effects. One of the desirable alleles was the landrace allele at QPH.73 × C.4.1, which reduced plant height. Actually, the height of the elite cultivar Cierzo is acceptable, but is still on the tall side of the range of variation of current cultivars; thus, the cultivar could benefit from height reduction. There are several QTLs with trait-increasing landrace alleles that could be used to increase the kernel weight of Cierzo. Three of the QTLs, namely, QTKW.73 × C.1.1, QTKW.73 × C.4.4, and QTKW.73 × C.6.5, are part of regions with other QTLs with negative contributions from landrace alleles to other agronomic traits, and their use for this purpose is not advisable. However, the landrace allele from QTKW.42 × C.6.2 contributed to a higher kernel weight and was linked to better early ground cover, as it was close to QGA.42 × C.6.3, for which the landrace allele increased early (February) ground cover.

There were 16 and 14 lines in each population with yield values above the elite parent's values plus 2 5% LSDs in each year. These lines will be further tested in field trials, and some may become candidate cultivars.

### Can Spanish Landraces Be Improved to Compete With Current Elite Cultivars?

The development of cultivars, performed since the 1960s, has led to shorter cultivars with faster growth cycles. Spanish barley landraces are generally too tall for current agriculture and are prone to lodging (Yahiaoui et al., 2014). Landrace lines with reduced height would be interesting materials. They could become cultivars directly or at least could be used as parents in plant breeding programs, due to their reduced genetic load. Therefore, plant height reduction is a sensible breeding and pre-breeding target for landrace plant material.

A good number of plant height QTLs were found, with the trait-increasing allele coming from the landrace side, except in one case. There is ample (mostly additive) variation for reducing plant height in these landraces by more than 15 cm, if the effect of all QTLs is considered. However, this trait showed antagonistic associations with kernel weight in our study. Moderate positive correlations (∼0.4) between plant height and kernel weight were reported in studies involving Mediterranean germplasm (von Korff et al., 2008; Rollins et al., 2013) under stress conditions. Consequently, this possible association should be taken into account when addressing barley breeding for this kind of environment.

In our populations, the positive PH-TKW correlation (i.e., negative association from an agronomic point of view) occurred in one population for the QTL at the beginning of 5H (QPH.73 × C.5.2 and QTKW.73 × C.5.4) and at the 6H QTL hotspot in both populations, meaning that selection for these regions will affect height and kernel weight in agronomically opposite directions. In the last case (6H), however, it is possible to select for decreased height and increased grain yield at the same time. In both populations, this PH QTL has the largest effect. Therefore, selecting for the elite allele in this single region will combine a large height reduction (6–7 cm, depending on the population) with increased yield. A reduction in kernel weight (2.5–3 g, depending on the population) would still produce lines with acceptable kernels, that are much larger than the elite parent's. Since plant height QTLs on the short arm of 6H are consistently detected in crosses involving germplasm from the Mediterranean region, this approach could be relevant for breeding with other germplasm of Mediterranean origin.

### AUTHOR CONTRIBUTIONS

EI, AC, and MG conceived this work. MG and EI developed the populations. MG, EI, and AM planned and carried out the field experiments and data collection. AC and AM performed the laboratory work. CC, BC-M, and AM curated and corrected marker data, corrected allele calls. AM performed the QTL analyses, and AM and EI performed other statistical analyses. AM, AC, and EI drafted the document. All the authors read and approved the manuscript.

### ACKNOWLEDGMENTS

This work was supported by Spanish Ministry of Economy, Industry and Competitiveness grants AGL2013- 48756-R, AGL2016-80967-R, RFP2012-00015-00-00, RFP2015 00006-00-00, and RTA2012-00033-C03-02. AM

### REFERENCES


was funded by the Spanish Ministry of Economy, Industry and Competitiveness grant no. BES-2014-069266 (linked to project AGL2013-48756-R).

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2019. 00434/full#supplementary-material


arabinoxylan content in 2-row spring barley grain. PLoS ONE. 12:e0182537. doi: 10.1371/journal.pone.0182537


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Monteagudo, Casas, Cantalapiedra, Contreras-Moreira, Gracia and Igartua. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Uncovering Genomic Regions Associated With 36 Agro-Morphological Traits in Indian Spring Wheat Using GWAS

Sonia Sheoran<sup>1</sup>† , Sarika Jaiswal<sup>2</sup>† , Deepender Kumar<sup>1</sup> , Nishu Raghav<sup>1</sup> , Ruchika Sharma<sup>1</sup> , Sushma Pawar<sup>1</sup> , Surinder Paul<sup>1</sup> , M. A. Iquebal<sup>2</sup> , Akanksha Jaiswar<sup>2</sup> , Pradeep Sharma<sup>1</sup> , Rajender Singh<sup>1</sup> , C. P. Singh<sup>3</sup> , Arun Gupta<sup>1</sup> , Neeraj Kumar<sup>2</sup> , U. B. Angadi<sup>2</sup> , Anil Rai<sup>2</sup> , G. P. Singh<sup>1</sup> , Dinesh Kumar<sup>2</sup> \* and Ratan Tiwari<sup>1</sup> \*

Edited by:

Dragan Perovic, Julius Kühn-Institut, Germany

#### Reviewed by:

Marion S. Röder, Leibniz-Institut für Pflanzengenetik und Kulturpflanzenforschung (IPK), Germany Awais Rasheed, International Maize and Wheat Improvement Center, Mexico Šurlan-Momirovic,´ University of Belgrade, Serbia

### \*Correspondence:

Dinesh Kumar dinesh.kumar@icar.gov.in Ratan Tiwari ratan.tiwari@icar.gov.in †These authors have contributed equally to this work

#### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 20 December 2018 Accepted: 04 April 2019 Published: 25 April 2019

#### Citation:

Sheoran S, Jaiswal S, Kumar D, Raghav N, Sharma R, Pawar S, Paul S, Iquebal MA, Jaiswar A, Sharma P, Singh R, Singh CP, Gupta A, Kumar N, Angadi UB, Rai A, Singh GP, Kumar D and Tiwari R (2019) Uncovering Genomic Regions Associated With 36 Agro-Morphological Traits in Indian Spring Wheat Using GWAS. Front. Plant Sci. 10:527. doi: 10.3389/fpls.2019.00527 1 ICAR-Indian Institute of Wheat and Barley Research, Karnal, India, <sup>2</sup> ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India, <sup>3</sup> Lokbharti-Sanosara Centre, Bhavnagar, India

Wheat genetic improvement by integration of advanced genomic technologies is one way of improving productivity. To facilitate the breeding of economically important traits in wheat, SNP loci and underlying candidate genes associated with the 36 agro-morphological traits were studied in a diverse panel of 404 genotypes. By using Breeders' 35K Axiom array in a comprehensive genome-wide association study covering 4364.79 cM of the wheat genome and applying a compressed mixed linear model, a total of 146 SNPs (−log<sup>10</sup> P ≥ 4) were found associated with 23 traits out of 36 traits studied explaining 3.7–47.0% of phenotypic variance. To reveal this a subset of 260 genotypes was characterized phenotypically for six quantitative traits [days to heading (DTH), days to maturity (DTM), plant height (PH), spike length (SL), awn length (Awn\_L), and leaf length (Leaf\_L)] under five environments. Gene annotations mined ∼38 putative candidate genes which were confirmed using tissue and stage specific gene expression data from RNA Seq. We observed strong co-localized loci for four traits (glume pubescence, SL, PH, and awn color) on chromosome 1B (24.64 cM) annotated five putative candidate genes. This study led to the discovery of hitherto unreported loci for some less explored traits (such as leaf sheath wax, awn attitude, and glume pubescence) besides the refined chromosomal regions of known loci associated with the traits. This study provides valuable information of the genetic loci and their potential genes underlying the traits such as awn characters which are being considered as important contributors toward yield enhancement.

Keywords: 35K Axiom array, agro-morphological, GWAS, SNP, wheat

## INTRODUCTION

Wheat (Triticum aestivum L.) crop provides one-fifth of total food calories and a quarter of protein in the human diet on daily basis<sup>1</sup> . To meet the increasing food demand of growing population, the breeders focused on the varieties having higher yield and yield stability, increased resistance/tolerance to biotic and abiotic stresses. Approximately 10,000 wheat varieties worldwide<sup>2</sup>

<sup>1</sup>www.fao.org/faostat/en

<sup>2</sup>www.wheatatlas.org/varieties

including 448 wheat varieties in India (Gupta et al., 2018) have been notified. Agro-morphological characterization of germplasm is fundamental in order to provide information for plant breeding programs. The QTL mapping methods based on bi-parental mapping populations identify the genomic regions with low resolution, whereas, Genome-wide association studies (GWAS), based on linkage disequilibrium (LD), take diverse genetic background into consideration to dissect the genetic architecture of complex traits with high resolution. The GWAS in wheat has started gaining importance in the recent past mainly focusing on yield and yield related traits (Liu et al., 2014; Liu Y. et al., 2017; Sukumaran et al., 2014, 2018; Arruda et al., 2015; Gao et al., 2015; Maccaferri et al., 2015; Arora et al., 2017).

Advances in next generation sequencing technology provided valuable wheat genomic and plant breeding resources including high quality genome data (Brenchley et al., 2012; Jia et al., 2013; International Wheat Genome Sequencing Consortium (IWGSC), 2014; Chapman et al., 2015). Several high throughput SNP arrays viz., 9K (Cavanagh et al., 2013), 90K (Wang, 2014), 820K (Winfield et al., 2016), 660K (Cui et al., 2017), 35K (Allen et al., 2017), and TaBW280K (Rimbert et al., 2018) have been developed and utilized in wheat. These SNP arrays have been successfully used for GWAS in European winter and spring wheat (Zanke et al., 2014), CIMMYT spring wheat (Sukumaran et al., 2014), United States elite wheat breeding genotypes (Lin et al., 2016), a panel of CIMCOG (CIMMYT Mexico core germplasm) Kazakhstan, Russian, and European wheat genotypes (Turuspekov et al., 2017) and Chinese bread wheat cultivars (Sun et al., 2017). A substantial number of novel SNP variants have been identified using 35K SNP Breeders arrays in the Watkins collection of landraces for further improvement of modern elite cultivars (Winfield et al., 2018). From the 820K SNP array (using global selection of germplasm including elite cultivars, landraces, progenitor, and ancestral species of wheat), Breeder's 35K Axioms array was developed which contains only mapped SNPs that are tailored to be most informative for specific purposes (Wilkinson et al., 2012; Borrill et al., 2015). 35K SNP array holds promise for detecting large scale variation in secondary and tertiary gene pools (Rasheed et al., 2017).

There are several agro-morphological traits which have been studied intensively and for which markers have been identified (Sukumaran et al., 2018). At the same time there are certain less explored traits for instance awn characters which can be considered as an alternative target for the improvement of wheat grain yield through their known functions including photosynthesis and increased water use efficiency (Rebetzke et al., 2016). The present study includes these characters of future importance hitherto not explored much until now. Moreover, most of the agro-morphological traits undertaken in this study are also utilized for the characterization of the genotypes using Distinctiveness Uniformity and Stability (DUS)<sup>3</sup> .

Some of the studied traits and the associated markers will be of immense importance in future toward developing input use efficient wheat varieties. Amongst them, genomic regions associated with days to heading (DTH) shall enable development

<sup>3</sup>http://www.upov.int/edocs/tgdocs/en/tg003.pdf

of early maturing wheat genotypes to avoid terminal heat stress and allowing an intervening legume crop before rice in the ensuing season (Tewolde et al., 2006; Joshi et al., 2007). For GWAS, size and diversity of the panel plays a significant role as it is suggested that the smaller panel (<384 accessions) and large LD blocks identified in association studies may lead to the identification of false positive associations (Turuspekov et al., 2017). Keeping this in account, a panel of 404 diverse genotypes comprising of indigenous collections, local landraces, released varieties and other improved genotypes was used. The panel also included registered genetic stocks characterized for early maturity, resistance/tolerance to biotic/abiotic stresses, adaptation to different environments, plant architecture etc. (Kundu et al., 2010). These are the major drivers for trait improvement program, using molecular breeding approach. Moreover, India has unique climatic variations rendering wheat germplasm diversity as gold mines. Genotypes adapted to different agro-climatic zones of the country are present in the genotypic panel. Therefore these may be considered as representative of the three major spring wheat growing mega-environments viz., ME 1, ME 4, and ME 5 described by the International Maize and Wheat Improvement Center (CIMMYT) spanning across all the 5 continents<sup>4</sup> . This will allow the breeders to utilize the information in developing genotypes specific to different adaptation conditions.

The present GWA study is the first attempt to undertake large scale evaluation of 35K Axiom array in a diverse panel of 404 Indian wheat genotypes. The 35K Breeders' Array was selected for present work due to its proven efficacy on panel of 1807 accessions of hexaploid wheat (804 accessions from Watkins Collection, a collection of wheat landraces made by A. E. Watkins in the 1920s and 1930s and 1003 modern and elite accessions) from 32 countries (Winfield et al., 2018). The aim of the study was to (i) identify significant MTAs for 36 agro-morphological traits for future breeding and (ii) mining putative candidate genes underlying the corresponding traits of interest. Furthermore, tissue and growth stage-specific gene expression data was also examined providing support to the detected candidate genes. For complex quantitative traits, the association panel was phenotyped at two locations for 2 years. The outcome of this study could be used to make effective strategies for the development of new varieties coupled with economic traits.

### MATERIALS AND METHODS

### Plant Material

A set of 404 bread wheat (T. aestivum L.) genotypes comprising of indigenous collections (91), landraces (8), released varieties (134), genetic stocks (43), and improved genotypes (128) was used to constitute a diverse association panel. The diverse lines were selected on the basis of pedigree, to reduce associations of spurious markers as it provides a buffer against skewness in terms of the environmental effects. Recently, we analyzed trait based diversity analysis using Shannon Index with 16 traits out

<sup>4</sup>http://wheatatlas.org/megaenvironments

of 36 traits with a collection of 7,000 diverse germplasm lines (unpublished work). Out of these, 450 genotypes comprising of indigenous collections, landraces, released varieties, improved genotypes and genetic stocks for various traits were selected from 7,000 lines which was downsized to 404 genotypes after eliminating admixtures, duplicates, etc. Comparing the trait based diversity of these selected 404 genotypes using Shannon Index endorsed it as good representative of 7,000 germplasm lines, thereby proving the suitability of 404 genotypes for GWA study (**Supplementary Table S1**). Seeds of 404 genotypes were obtained from the Germplasm Resource Unit, ICAR-IIWBR (Indian Institute of Wheat and Barley Research), Karnal, Haryana, India, which acts as a nodal center for wheat in the country. Detailed information with pedigree for each genotype is given in **Supplementary Table S2**.

### Field Trials and Phenotyping

The 404 genotypes were evaluated for 30 qualitative characters at the experimental field of ICAR-IIWBR, Karnal during the crop season 2016–2017. A subset of 260 genotypes was phenotyped for six quantitative traits [days to heading (DTH), days to maturity (DTM), spike length (SL), plant height (PH), awn length (Awn\_L), and leaf length (Leaf\_L)] at three locations viz., Experimental field, Karnal (29◦ 420N, 77◦ 020E); Seed Farm, Karnal (29.7138◦ N, 76.9943◦E), Haryana, India and Lokbharti-Sanosara Centre, Bhavnagar (21◦ 46<sup>0</sup> N 72◦ 11<sup>0</sup> E), Gujarat, India during year 2016–2017. Besides these three environments, an additional environment was taken by phenotyping the subset in crop season 2017–2018 at ICAR-IIWBR, Karnal. Experiment was conducted in two replications following alpha lattice design. To minimize the variations, every genotype was planted with a dibbling tool named IIWBR Dibbler (Sharma et al., 2016) having four rows. The plant to plant distance was 10 cm and row to row distance was maintained at 20 cm. This unique sowing method has helped in avoiding confounding effects of extraneous errors and improved the precision in phenotyping leading to moderate to high estimation of heritability (H<sup>2</sup> ) thereby enhancing the probability of identifying genes of minor effects related to complex traits (Sharma et al., 2016).

Thirty six agro-morphological characters including coleoptiles anthocyanin coloration (C\_Col), plant growth habit (PGH), foliage color (Fol\_Col), flag leaf anthocyanin coloration of auricle (Aur\_Col), flag leaf hairs on auricle (pubescence) (Aur\_Pub), flag leaf attitude (Leaf\_Att), ear emergence/days to heading (DTH), flag leaf waxiness of sheath (Wax\_LS), flag leaf waxiness of blade (Wax\_LB), ear waxiness (Wax\_Ear), waxiness of peduncle (Wax\_Ped), flag leaf length (Leaf\_L), flag leaf breath (width) (Leaf\_Br), PH, ear shape (Ear\_S), ear density (Ear\_D), ear (spike) length (SL), awn presence (Awn\_P), awns length (Awn\_L), awn color (Awn\_Col), awn attitude (Awn\_Att), outer glume pubescence (Glu\_Pub), ear color (Ear\_Col), lower glume shoulder width (Sh\_Wid), lower glume: shoulder shape (Shl\_Sh), beak length (Beak\_L), beak shape (Beak\_Sh), spike (peduncle) attitude (Ped\_Att), grain coloration with phenol (Grn\_Ph), grain color (Grn\_Col), grain shape (Grn\_Sh) grain germ width (Germ\_Wid), brush hair length (Brush\_L), seed (grain) size (Grn\_Size), grain hardness (texture) (Grn\_Tex) and DTM were recorded as per guidelines laid out by Protection of Plant Varieties and Farmers' Right Authority (PPV and FRA, 2011)<sup>5</sup> . Procedure of recording the data for each trait is summarized in **Supplementary File S1**. Qualitative traits were recorded as binary (presence or absence), ordinal (visual scale of the expression intensity of a characteristic) and nominal (color or shape) (**Supplementary Table S1**). For association analysis, a total of five environments (E1–E5) were considered namely, E1 – average of two replications at ICAR-IIWBR, Karnal (2016); E2 – average of two replications at Seed Farm, Karnal (2016); E3–average of two locations of Karnal (2016); E4 – average of two replications at ICAR-IIWBR, Karnal (2017) and E5 – average of two replications at Bhavnagar, Gujarat (2016).

### Statistical Analysis

Phenotypic data was analyzed using SAS v.9.3 (SAS Institute 2011<sup>6</sup> ). Pearson pairwise correlation was calculated for all the traits. Histograms were created in R (R Development Core Team, 2013) using the hist() function. The PROC CORR procedure was employed to calculate correlations among phenotypes. Variance components for the quantitative traits were analyzed using general linear model to detect the effect of genotypes, environment, replication and genotype × environment interaction. All sources of variation were considered as random effects. The broad sense heritability for the traits was estimated by the formula H<sup>2</sup> = VG/(V<sup>G</sup> + GE) where V<sup>G</sup> and V<sup>E</sup> represent estimates of genetic and environmental variance, respectively.

### SNP Genotyping and Filtering

Genomic DNA was extracted from 15 days old seedlings according to the CIMMYT Molecular Genetics Manual (Dreisigacker et al., 2013). A Nanodrop 1000 spectrophotometer was used for quantifying DNA at 260 nm absorbance (Biodrop Touch PC+125). The DNA samples were used for genotyping with 35K Axiom <sup>R</sup> Wheat Breeder's Array (Affymetrix UK Ltd., United Kingdom). Quality preprocessing of 35,143 markers obtained from 35K chip was done by using PLINK software (Purcell et al., 2007). Markers with more than 5% missing values, less than 5% minor allele frequency (MAF) and individuals with more than 15% missing SNP calls were removed from the dataset. Markers with no known chromosomal positions, based on high density consensus map generated by using five mapping populations<sup>7</sup> (Allen et al., 2017), were also removed. Duplicate markers were further removed by R/QTL software (Broman et al., 2003; Arends et al., 2010).

### Genetic Diversity and Population Structure Analysis

The basic statistics such as genetic diversity (GD) and polymorphism information content (PIC) was evaluated by using PowerMarker v3.2.5 (Liu and Muse, 2005). The modelbased Bayesian cluster analysis program, STRUCTURE v2.3.4

<sup>5</sup>http://plantauthority.gov.in/pdf/PVJFebruary2011.pdf

<sup>6</sup>http://www.sas.com

<sup>7</sup>http://www.cerealsdb.uk.net/cerealgenomics/CerealsDB/Excel/35K\_array/ Supplementary\_file\_3.xlsx

(Hubisz et al., 2009) was used to infer the population structure. A total 100,000 burn-in periods followed by 100,000 Markov Chain Monte Carlo (MCMC) iterations from K = 2 to K = 7 clusters were used to identify the optimal cluster (K). Three independent runs were generated for each K. The results of the analysis were used as input to the Structure Harvester tool (Earl and VonHoldt, 2011) to predict the best K-value based on Evanno method (Evanno et al., 2005). Principal component analysis (PCA) and Neighbor-joining (NJ) tree were created to validate population stratification with the software GAPIT (Lipka et al., 2012) and DARwin v6 (Perrier and Jacquemoud-Collet, 2006), respectively.

### Linkage Disequilibrium

For Linkage disequilibrium analysis, r 2 (squared correlation coefficient) values among markers of all pairs of loci were calculated using PLINK 1.9 tool<sup>8</sup> (Purcell et al., 2007). Default window size cut off of r 2 value was used for this analysis. Finally, LD plotting was done for three sub genomes (A, B, and D genomes), on the basis of centiMorgans (cM) distance, using ggplot2 package of R Bioconductor (Wickham et al., 2009). The percentage of marker pairs below the critical LD (r <sup>2</sup> > 0.02) was also compared in the sub-genomes. Pairwise LD estimates in the region of interest for significantly associated markers were investigated using Haploview 4.2 (Barrett et al., 2005).

### Association Analysis

Association analysis was performed using compressed mixed linear model (CMLM) implemented by Genomic Association and Prediction Integrated Tool (GAPIT) in R (Lipka et al., 2012) which took into account a K-PC model (Zhao et al., 2007) where kinship information together with the first three principal components (PC) as covariates were included for GWAS, which further improves statistical power. Kinship matrix was iteratively calculated using the VanRaden method (VanRaden, 2008). The best fit of the model was evaluated on the Q-Q plots generated by the model. A threshold of −log<sup>10</sup> P > 4 (−log10P ≥ 4 for quantitative traits) was used to state significant marker trait associations. Associations with false discovery rate (FDR) adjusted at 10% was used to determine the P-values thresholds.

### Putative Candidate Gene Analysis and Expression Data

To find candidate genes or putative related proteins of SNP flanking-regions, BLASTx search was conducted for significant MTAs against recently released genome sequence IWGSC RefSeq v1.0<sup>9</sup> . Each MTA was searched for IWGSC sequence information in Ensembl plant for T. aestivum<sup>10</sup>. The flanking sequence available for the SNP marker with maximum bases (1,000 bases before and after the SNP) was considered for BLASTx analysis. We also looked at the number of high confidence genes adjacent to the significant MTAs using the RefSeq v1.0.Gene Ontology (GO) annotation of the potential candidate genes was carried out using Blast2GO pro tool v.3.1.3 (Conesa and Götz, 2008). The expression profile of all the putative candidate genes associated with the identified SNPs were checked using wheat RNA-seq expression database of polyploid wheat<sup>11</sup>. This database consists of the transcript profile of five tissues (grain, leaf, root, spike, and stem) at 3 different time points (growth stages) each and environmental treatments (Pearce et al., 2015). Expression of the gene was measured in units of FPKM (Fragments Per Kilobase of transcript per Million mapped reads). Expression profile was carried out to further provide supporting evidence to corroborate candidate genes (tissue and stage of expression).

### RESULTS

### Phenotypic Variation and Correlation Analysis

The frequency distribution of phenotypic data of 404 genotypes characterized for 30 traits is given in the **Supplementary Figure S1**. The phenotypic variations of six quantitative traits (DTH, DTM, PH, SL, Awn\_L, and Leaf\_L) was recorded in multiple environments. Phenotypic variation of these traits among genotypes was corroborated by mean, standard deviation, range and coefficient of variation (**Supplementary Table S3**). The mean value of DTH, DTM, PH, SL, Awn\_L, Leaf\_L varied from 66.72 to 92.0 days, 105.28 to 135.37 days, 94.45 to 113.18 cm, 10.33 to 13.30 cm, 0 to 19.50 cm, 16.25 to 41.40 cm, respectively. This data revealed extensive variation in the traits of the diverse set suggesting the suitability of genotypic panel for association studies. Phenotypic values for each of the six traits were found normally distributed (**Supplementary Figure S2**). Analysis of variance (ANOVA) was conducted to test the effects of genotype (G), environment (E) and their interactions (G × E). Significant differences were observed among the genotypes (p < 0.0001), the effect of environment and their interaction (G × E) indicating the environmental effect on these traits (**Supplementary Table S4**). Estimates of correlation coefficients of this combined analysis are shown in **Supplementary Table S5** and in **Figure 1**, a positive correlation was observed for DTH with DTM (0.36), SL (0.18), and PH (0.17) while SL exhibited negative correlation with PH (−0.17).

### SNP Markers Statistics

Quality preprocessing of 35,143 markers obtained from 35K chip was done by using PLINK software<sup>12</sup> (Purcell et al., 2007). 6,041 monomorphic markers were excluded from the analysis. Out of 29,102 SNP markers, 8,673 SNPs failed frequency test (MAF <0.05) and 1,383 markers removed failing missingness test >0.05. Only 2 individuals for low genotyping (MIND >0.2) were removed. Further, 4,740 SNPs were excluded for lack of their physical position and 146 being duplicate markers. After filtering, 402 genotypes with 14,160 SNP markers were used for GWAS. These markers covered a genetic distance of 4364.79 cM, with an average density of 0.3 cM. Marker

<sup>8</sup>http://zzz.bwh.harvard.edu/plink/

<sup>9</sup>http://www.wheatgenome.org/

<sup>10</sup>http://plants.ensembl.org/Triticum\_aestivum

<sup>11</sup>https://wheat.pw.usda.gov/WheatExp/

<sup>12</sup>http://pngu.mgh.harvard.edu/purcell/plink

density was found highest for B genome (1029.6 markers per chromosome) followed by A (788.9 markers per chromosome) and D genome (207.8 markers per chromosome). Among the genome, chromosome 2B had the highest number of markers (1324) while 4D chromosome spanned the lowest number of markers (55) (**Supplementary Table S6**).

### Population Structure and Linkage Disequilibrium

The mean GD and the PIC for the whole genome were 0.36 and 0.29, respectively. Both GD and PIC of the A genome (0.357 and 0.286) and B genome (0.372 and 0.291) were higher than the D genome (0.345 and 0.276). The number of markers, map length, GD and PIC for each chromosome are shown in **Supplementary Table S6**.

In the present study, the population structure of a diverse panel of 402 wheat genotypes was investigated on the basis of a 1K method of model-based Bayesian clustering using 14,160 SNP markers. Population structure analysis clearly indicated the existence of three distinct major subpopulations in the bread wheat panel, which was found consistent with the results of the PCA and neighbor-joining (NJ) tree analysis (**Figure 2**). Subgroup I, the largest group with 169 accessions, was dominated by recently released varieties and breeding lines adapted to Northern wheat growing zone of the country and genetic stocks for biotic resistance (Rust and Karnal bunt). DPW621- 50 (2011), HD2967 (2013), WH1105 (2013), HD3059 (2013), DBW88 (2014), HD3086 (2014), and DBW90 (2014) are some recently released varieties. The pedigree showed that the varieties DPW621-50, DBW88, and HD3059 had common pedigree (**Supplementary Table S2**). Breeding lines for instance HUW675, HUW666, HPW373, HD3133 and varieties MP1201, HS507, HS542 and WH1105 had MILAN in the parentage. Subgroup II consisted of 87 accessions, mainly comprising local landraces from pre green revolution era; Subgroup III had 146 accessions, predominantly from the warmer region of the country and also comprised of early maturing genotypes (short maturity duration of about 120 days) released for late sowing (toward end of November and to mid of December) in different agro climatic zones viz., K8962, Raj3765, DBW16, and MP3336 having HD2160 (a triple dwarf genotype) as a common progenitor in their background (**Supplementary Table S2**). Early Mexican cultivars that paved the way for green revolution, Sonalika, SONORA64, Safed Lerma appeared in this cluster along with the derivatives of Sonalika like UP262, HW2001, and Lok54.

Linkage disequilibrium (LD) decay distance in the selected panel was found highest in the D genome which decayed at about 5 cM (r <sup>2</sup> = 0.02) as compared to ∼2 cM in A and B genomes (**Supplementary Figure S3**). Faster LD decay in D genome vis-à-vis A or B genome has been reported earlier in GWAS of wheat (Lopes et al., 2013; Zhang et al., 2013). With an increase of the genetic distance, the r 2 value of the A, B, and D genomes decreased gradually. Genome A (62.7%) showed the highest frequency of physically linked locus pairs followed by B (58.0%) and D (53.6%) genomes.

### Genome–Wide Association Analysis

In order to detect the most significant marker-trait associations, CMLM was employed to deal with the confounding effect of the population structure. This was followed by the inspection of Q–Q plots and Manhattan plots for evidence of P-value inflation (**Supplementary Figures S4**, **S5**). Based on the stringent criterion of −log<sup>10</sup> P > 4, we detected 99 significant MTAs ranging from 7.49 e-05 to 2.47 e-11 for 17 qualitative traits (**Figure 3**, **Table 1**, **Supplementary Figure S4**, and **Supplementary Table S7**) explaining 5.3–33.3% phenotypic variations. It is imperative to note that, not every gene is likely to be represented by 35K SNP array based markers. Therefore the markers in linkage disequilibrium indicates either it is the causative gene itself or might be in close linkage to the causative gene. For color related traits, a total of 22 SNPs were found associated with coleoptile color on five chromosomes, i.e., 2A, 4B, 6B, 5B, and 6A. However, the genomic region on chromosome 6B was represented by eighteen SNPs, mapped within genetic distance of 62.83– 67.99 cM (distance interval of 5.15 cM) which collectively explained 23.9% of the phenotypic variation (**Supplementary Figure S4** and **Supplementary Table S7**). For awn and ear color, a significant MTA was detected on chromosome 1B but at different loci, i.e., at 24.64 cM accounting for a phenotypic variation of 8.8% and at 8.24 cM explaining 10.5% phenotypic variation, respectively (**Supplementary Figure S4** and **Supplementary Table S7**).

For waxiness characters, two MTAs were identified for leaf sheath wax on chromosomes 6D (7.48 e-05), 3A (P < 0.0001) hitherto not reported (**Supplementary Figure S4**) and one MTA for peduncle wax on chromosome 3A (2.87 e-05). MTAs

associated with leaf sheath wax and peduncle wax contributed to the trait negatively. For glume related traits, six MTAs were detected for glume pubescence on chromosomes 1A (2), 1B (1), and 2B (3) explaining phenotypic variation ranging from 8.9 to 12.0% with positive effect. For shoulder width and brush length, significant associations were detected on

chromosome 2D and 3A, respectively (**Supplementary Figure S4** and **Supplementary Table S7**). For awn related traits, a genomic region found associated with awn attitude represented by three SNP markers (AX-94613491, AX-94519690, and AX-94453668) on chromosome 5A spanning a region from 59.99 to 70.36 cM. The phenotypic variation explained by SNPs ranged from 24.2 to 25.0% and all the three SNPs showed positive effect on the awn attitude. For awn presence, several markers or regions were identified across the chromosomes (1A, 1B, 1D, 2A, 2B, 3A, 3B, 4A, 4B, 5A, 5B, 6B, 6D, 7A, and 7B) explaining phenotypic variation ranging from 28.7 to 33.9%. A chromosomal region of 12.22 cM (66.99– 72.22 cM) on chromosome 5A harbored significant MTAs associated with multiple traits (awn length, auricle color, and awn presence).

For plant growth habit two SNPs were detected on chromosome 1B but at different loci, one at 8.24 cM and other at 38.86 cM indicating the role of two independent loci on chromosome 1B explaining phenotypic variation ranging from 20.0 to 21.2%.

For grain related traits, in the current study 9 SNPs on chromosome 2A spanning 0.71 cM region (124.18– 124.89 cM) were found significantly associated with phenol color indicating the importance of this region. These SNPs explained 15.1–18.6% phenotypic variation. For grain texture (phenotype scored as hard or soft), a total of two regions were detected; one on chromosome 7A containing three markers (100.09 cM) and other on chromosome 6B comprising four markers (62.83 cM). Both these MTAs for grain texture contributed negatively to the trait. For germ width, significant association was detected on chromosome 4A. Only one MTA (AX-94670534) for ear shape was detected on chromosome 7B explaining 8.3% of phenotypic variance having negative effect. We did not find any significant MTA for ear density.

For the six quantitative traits (DTH, DTM, PH, Leaf\_L, SL, and Awn\_L), a total of 47 significant SNPs were identified in five environments which explained 5.2–47.3% of phenotypic variation (**Table 2** and **Supplementary Figure S5**). We successfully detected both previously reported genomic regions and novel loci for the traits in wheat (**Tables 1**, **2** and **Supplementary Figures S4**, **S5**). Flowering time or DTH is a crucial trait which affects the adaptation of wheat in its target environment. A total of 5 SNPs for DTH were detected on chromosome 4A, 5A, and 7D with phenotypic contributions ranging from 19.1 to 32.5%. Out of the 5 SNPs associated with DTH, SNP AX-9454244 on chromosome 4A (78.09 cM) and SNP AX-95187165 (89.02 cM) on chromosome 5A showed pleiotropic effect on DTM. For DTM, a locus on chromosome 2A at 179.61 cM has been detected in the three environments (E2, E3, and E5; **Table 2**) explaining the average phenotypic variation of 22.9% suggesting the importance of this region while another locus on chromosome 2A at 83.23 cM was observed for three traits (DTM, PH, and SL). Two MTAs were detected on chromosome 5A (AX-95187165, 89.02 cM and AX-95652310, 72.22 cM), indicating the presence of two independent loci on chromosome 5A with a positive effect on DTM.

For PH, a total of six MTAs were identified, one each on chromosome 1B (24.64 cM), 2A (83.23 cM), 2B (104.59 cM),

TABLE 1 | Genome wide significant MTAs identified for the qualitative traits using CMLM.


<sup>a</sup>Chr, chromosome, #Only few markers with highest and lowest P-value are mentioned where more SNP markers were associated with trait (C\_Col, Awn\_P, Grn\_Tex, and Grn\_Ph). The details of all the markers associated with traits are given in Supplementary Table S7.

7A (29.9 cM), and two on 5D (1.58 cM) considering all the environments (**Table 2**). MTAs significantly associated with SL were mainly distributed on chromosome 1B, 2A, 3A, 3B, 3D, 5A, 7A, and 7B. The phenotypic variation contributed by SNPs ranging from 8.3 to 27.6% for SL. The SNP AX-94517196 (83.23 cM) on chromosome 2A (**Supplementary Figure S5** and **Table 2**) showed 15.6% of phenotypic variation having positive effect on the trait. Three MTA for flag leaf length were detected on chromosome 4A, 5A, and 7B explaining phenotypic variation ranging from 9.7 to 15.7% (**Table 2**). The SNP AX-95196340 and AX-94406861 on chromosome 7B and 5A, respectively, showed positive effect on leaf length. For awn length, a total of 8 significant MTAs were identified mainly distributed on chromosome 1B


TABLE 2


 traits in the current and previous study.

9 April 2019 | Volume 10 | Article 527


expression check. Effect of Favorable Alleles on Agronomic Traits Early maturing, high yielding wheat genotypes are of immense importance toward increasing the cropping intensity as well

as ensuring high input use efficiency particularly for inputs like water, which are going to be scarce. Therefore the present study dissecting important agronomic traits such as DTH, PH, and SL enables utilization of available diversity by exploiting associated markers. SNP alleles which led to decrease in DTH, PH and increase in SL were considered as "favorable alleles" and vice-versa was defined as "unfavorable alleles." **Figure 4** depicted higher frequency of favorable alleles which led to decrease in PH and DTH with phenotypic variation of 17.6 and 9.0%, respectively. Similarly, by increasing the number

(1), 3B (1), 4B (2), 6D (2), and 7B (2) which collectively explained 18.0% of the phenotypic variation. Two loci, one at 1.72 cM on chromosome 7B and other at 17.65 cM on chromosome 6D shared association with awn length and awn presence.

### Pleiotropy Effect

In the present study, we observed same SNPs with multiple traits which could be due to pleiotropy or different causal genes in LD for instance SNP AX-94656878 at 83.23 cM (chromosome 2A) explained variation for two traits (PH and DTM) (**Table 2**). Similarly, another locus (SNP AX-94527988) on chromosome 3A at 115.9 cM was found pleiotropic with LS\_Wax and Ped\_Wax (**Table 1**). Also, the two SNPs associated with DTH on chromosome 4A (78.09 cM) and 5A (89.02 cM) were found linked with DTM (**Table 2**). The pleiotropic effects observed in the study were in agreement with the Pearson's correlations observed between the agronomic traits (**Figure 1**).

### Identification of Putative Candidate Genes and Expression Analysis

We identified several putative candidate genes such as storage protein activator (spa), beta-amylase 2 (bmy2), cytochrome P450, shikimate kinase, b-ZIP transcription factor, for the phenotypic variations of the traits (**Table 3**). These putative proteins identified were highly homologous to different species of Triticum or Aegilops. Highest number of putative candidate genes were observed for MTAs associated with SL encoding a total of five candidate genes [actin-related protein subunit 3 (ARPC3), DIMINUTO, replication protein A, carboxypeptidase D, and basic region/leucine zipper].

To determine the relative expression profile of the identified transcripts in broad range of tissues from different developmental stages, the published RNA-seq data and the Wheat-Exp web tool of the wheat cultivar, Chinese Spring was explored (Choulet et al., 2014; Pearce et al., 2015). The expression profile of significant SNPs encoded putative candidate genes is given in the **Table 3** and **Supplementary Table S8**. FPKM value > 5 was considered for tissue and developmental specific

fpls-10-00527 May 2, 2019 Time: 9:49 # 10


(Continued)


of favorable alleles, SL increased with R <sup>2</sup> of 6.4%. Results of the study showed that favorable alleles exhibited significant positive effects on the phenotypic traits as compared to the unfavorable alleles. This would help in cultivar adaptation and finally to grain yield.

### Traits Sharing Co-localized Genomic Regions

In the present study, the most promising co-localized genomic region was identified on chromosome 1B at 24.64 cM associated with four traits (Glu\_Pub, Awn\_Col, SL, and PH) and 26.22 cM with Awn\_L. The genomic co-location of loci (24.64 cM) with four traits implies either a strong physical linkage between genes underlying these important traits, or a pleiotropic effect. Therefore, to dissect the genetic cause of the observed association, LD patterns and candidate genes underlying the region and transcript profile of the targeted region were investigated (**Figure 5**). The SL and PH at this locus showed greater LD estimates (>0.8) indicating closely dependent biological processes. Whereas comparatively moderate and low LD was observed with Glu\_Pub and Awn\_Col, respectively, might be due to low overall recombination vis-à-vis greater recombination frequency with other genomic regions. Notably, this locus harbored three candidate genes for SL. SNP AX-94981940 (−log<sup>10</sup> P = 4.21), was annotated as a replication protein A subunit and its transcript expression was almost solely abundant in the young spike at Zadoks 32 stage (**Table 3**). Replication protein A has important role as single strand DNA binding protein in various DNA metabolic pathways (Aklilu and Culligan, 2016). Similarly, the other SNP encoded a protein carboxypeptidase-D which functions as a positive regulator of grain size in rice (Li et al., 2011). The sequence of SNP AX-94561972 linked with SL annotated as basic region/leucine zipper protein. The bZIP transcription factor family plays an important role in growth, development, and response to abiotic or biotic stresses (Yin et al., 2017). It is interesting to note that PH shared common significant loci with SL showing high correlation between these traits in concurrence with the previous results (Sukumaran et al., 2014). SNP AX-94626335 (−log<sup>10</sup> P = 5.10), associated for Glu\_Pub at this locus (24.64 cM) was annotated as metal tolerance protein (MTP) which is known for its potential involvement in providing a sink for trace element storage in wheat grains (Vatansever et al., 2017). Earlier Echeverry-Solarte et al. (2015) also reported the influence of glume pubescence on SL by identification of a cluster of co-localizing QTL on same locus for both the traits. Another Glu\_Pub associated SNP AX-94664731 on chromosome 1B at 24.64 cM annotated tetratricopeptide repeat protein SKI3, showed highest expression at Zadok39 growth stage of spike. The TaFlo2-A1 gene, an orthologous of rice Flo2 has four motifs of tetratricopeptides found associated with thousand grain weight (Sajjad et al., 2017) and F-box protein containing domains of tetratricopeptides known to regulate plant development and their abundance during spike development in wheat (Hong et al., 2012). The expression patterns of the putative candidate genes in different organs (**Figure 5**) are consistent with the RNA-seq FPKM

fpls-10-00527 May 2, 2019 Time: 9:49 # 12

expression patterns. The single genomic locus identified for these important related traits, needs further studies to fine map and validate the identity of the causal locus.

### DISCUSSION

The diversity panel selected in this study has high GD (0.363) and PIC (0.29) indicating higher polymorphism than listed in the previous reports (Liu J. et al., 2017; Eltaher et al., 2018). Further, the B genome had higher GD and PIC followed by A and D genome, consistent with the previous report (Ain et al., 2015). The highest LD decay rate of 5 cM for D genome obtained in this study employing 14,160 SNP markers was found in congruence with 90K SNP (Sukumaran et al., 2014) and 9K (Lopes et al., 2013) marker data. The results from the three clustering methods (Structure, PCA, and NJ tree analysis) showed the presence of three subpopulations in this study consistent with the geographic origins and pedigrees of the selected panel. Thus the marker density, diversity and sample size of this study is sufficiently powered to capture allelic variations for the selected traits. Ma et al. (2013) reported that various imputation methods could be used to impute the data from low density to high density, i.e., from 3K to 54K, and subsequently from 54K to 777K. Therefore the data generated with 35K breeders array can be imputed to high density using 820K information.

In this study, a GWAS panel was characterized for 36 agromorphological traits identified 146 MTAs (−log<sup>10</sup> P ≥ 4) for

23 traits. For majority of the heritable traits, at high significant level single locus has been identified indicating that they are controlled by small number of loci, for instance PGH (1B), LS\_wax (6D, 3A), Ped\_Wax (3A), Sh\_Wid (2D), Ear\_Col (1B), Ear\_Sh (7B), Awn\_Col (1B), Awn\_Att (5A), Brush\_L (3A), Germ\_Wid (4A), and Grn\_Ph (2A). For coleoptile color, the genes that regulate anthocyanin biosynthesis pathway, have been cloned and mapped on homoeologous groups 3 and 6 (phenylalanine ammonia-lyase), homoeologous groups 1 and 2 (chalcone synthase), homoeologous group 5 (chalcone-flavanone isomerase) (Li et al., 1999). The presence of the dominant allele at the Rc-1 homeologous loci responsible for anthocyanin pigmentation in coleoptile was correlated with F3H (flavanone 3 hydroxylase) gene on chromosome 2A (Khlestkina et al., 2008). However, in this study, besides chromosome 2A, we also detected the loci for coleoptile color on chromosome 6B, 5B, 4B, and 6A. In agreement to this, Sutka (1977) identified the gene designated Rc4 for coleoptile color on chromosome 6B, however, it was not further confirmed in any study (Khlestkina et al., 2002; McIntosh et al., 2014) while suppressors playing role in the intensity of the coleoptile coloration were identified on chromosomes 2A, 2B, 2D, 4B, and 6A of "Mironovskaya 808" bread wheat variety. Hence the loci identified herein further confirmed the role of chromosome 6B for coleoptile color. For Awn\_Col and Ear\_Col, we identified loci on chromosome 1B but at different positions, i.e., 24.64 and 8.24 cM, respectively. Earlier, Zeven (1983) also reported a semi-dominant gene (Rg) on chromosome 1B responsible for the brown ear character of bread wheat. For Aur\_Col, contrasting to the region reported on chromosome 4A and 5B (Zeven, 1985), we detected its locus on chromosomes 4B and 5A owing to the instability of its expression.

Epicuticular wax is associated with increased drought tolerance in wheat (Bennett and Schnurbusch, 2016), rice (Haque et al., 1992), maize (Meeks et al., 2012), barley (Febrero et al., 1998), and many other crops. Herein, we report an additional locus for LS\_Wax on chromosome 6D besides previously identified genomic region on chromosome 3A for waxiness. Interestingly, the SNP AX-94527988 (chromosome 3A) was found associated with both LS\_wax and Ped\_wax thereby indicating its pleiotropic behavior. In fact, this result indicated that some casual gene(s) might exist in this genomic region for wax, as the common MTA AX-94527988 annotated cytochrome P450 protein which leads to a double-hydroxylation to the corresponding oxo-2-alkanol esters which are also previously detected for both peduncle and flag leaf waxes (Racovita et al., 2016).

Glume pubescence appears to have a beneficial influence on drought/cold tolerance (Borner et al., 2005). In this study, we identified significant MTA (1.47 e-07) on chromosome 1A which is in agreement with Sears (1953) who mapped a gene (Hg) responsible for Glu\_Pub on chromosomes 1A. In addition to this, we detected genomic regions associated with Glu\_Pub on chromosomes 1B (24.64 cM) and 2B (76.24, 76.38, and 104.59 cM) which might be considered as novel region controlling the trait. MTA AX-95023665 linked to Glu\_Pub encoded fatty acid biosynthetic process. Glas et al. (2012) observed that methyl ketones which are produced during fatty acid biosynthesis were the major constituent of type VI trichomes of the wild tomato Solanum habrochaites f. glabratum and are very effective in protecting the plant from pests. Pubescent plants also produced a higher number of grains per spikelet compared to non-pubescent plants (Maes et al., 2001).

Several MTAs detected for awn presence were found distributed across the wheat chromosomes except for 2D, 3D, 4D, 5D, 6A, and 7D. However, we could not locate any awn development dominant inhibitor genes Hd, B1, B2 fine mapped on chromosome 4AS, 5A, and 6BL in hexaploid wheat (Sourdille et al., 2002; Yoshioka et al., 2017) which may be due to skewness for the awned genotypes in the diversity panel. For awn attitude, one QTL located on chromosome 5A (59.99–70.36 cM) was identified. To the best of our knowledge, none of the previous studies have reported a genomic region for Awn\_Att, suggesting this could be a responsible locus for the trait. The MTA AX-94613491 encoded hexose carrier protein HEX 6 which is responsible for controlling the flux of carbon and plays a role in the carbohydrate transport and distribution in plant cells<sup>13</sup> .

A gene for high PPO activity responsible for grain color was mapped on the long arm of chromosome 2A in wheat mapping population (Simeone et al., 2002). Similarly, in the current study, 9 SNPs on chromosome 2A spanning 0.71 cM region (124.18–124.89 cM) were found significantly associated with phenol color indicating the importance of this region. These SNPs explained 15.1–18.6% phenotypic variation. Out of 9 SNPs associated with phenol color, 6 SNPs encoded 5 proteins (**Table 3**). The transcript profile of nucleoredoxin 1-2 (NRX1) gene encoded by SNP AX-94738314 associated with phenol color showed highest expression (FPKM-272.53) in grain at Zadok 72 stage (early milk). Protective effect of NRX1 boosted the H2O<sup>2</sup> detoxification capacity of catalase, thereby protecting the plant cell from oxidative stress (Kneeshaw et al., 2017). Significant SNPs associated with grain texture (phenotype scored as hard or soft), encoded four proteins (**Table 3**). Most of the proteins identified in this study for grain traits were similar to proteins reported by Arora et al. (2017) for grain related traits.

The genetic architecture of the quantitative traits is complex as controlled by many loci with small effect. Several significant markers mined for six complex traits in this study were co-localized with the previously reported QTL regions (**Table 2**). Flowering is controlled by a complex network of genes integrating vernalization response genes (Vrn) on chromosomes 5A (Vrn-A1and Vrn-A2) and 7BS (Vrn-A3), photoperiod response gene on chromosome 2, and earliness genes on chromosomes 1A and 3A (Fowler and Laudencia-chingcuanco, 2016). The MTAs for DTH were detected on three chromosomes 4A, 5A, and 7D. MTA AX-95187165 (89.02 cM) identified for DTH on chromosome 5A was mapped at nearly same position of the functional gene Vrn-A1 (90 cM) indicating its role for determining DTH (Sukumaran et al., 2014). Another putative candidate, AX-94542441 (chromosome 4A) associated with DTH is an ortholog of shikimate kinase-like protein. Likewise in rice (Oryza sativa) two shikimate kinase isoforms OsSK1 and OsSK3 accumulate to high levels during the heading stage of panicle

<sup>13</sup>https://patents.justia.com/patent/6624343

development and involved in floral organ development (Kasai et al., 2005). The most significant association for DTH was found on chromosome 7D stable in four environments (E1–E4). On similar lines, Lopes et al. (2013) also reported a significant QTL for DTH on chromosome 7D, associated for more than 30% of DTH variation but at different position.

The SNPs significantly associated with DTM were identified on chromosomes 3B, 5A, 5B, and 6A corresponding to the earlier reported genomic regions for DTM on chromosome 3B (Sukumaran et al., 2014, 2018; Okechukwu, 2017), 5A (Gahlaut et al., 2017), 5B (Gahlaut et al., 2017; Zou et al., 2017), and 6A (Sukumaran et al., 2014). The significant QTL for DTM harboring three SNPs were observed on chromosome 3B (61.38, 84.9, and 85.27 cM) were closely co-localized with the previously identified MTAs for DTM, indicating that these QTLs were stable and could be detected in different environments. Another noteworthy region, associated with DTM was identified on chromosome 2A (179.61 cM) consistent in three environments encoding BTB/POZ domain and ankyrin repeat-containing protein which plays key role in plant growth and development stages (Sharma and Pandey, 2016). Moreover, chromosome 2A associated with DTM also encompasses a region (83.23 cM) governing multiple traits (PH and SL) thereby representing the correlation of these traits in diverse panel in agreement with the previous reports (Kamani et al., 2017; Zhai et al., 2016).

For PH, as many as six MTAs were identified on chromosome 1B, 2A, 2B, 5D (2), and 7A. The genomic region of MTA (AX-94941145) identified at 29.9 cM on chromosome 7A falls in the region of the reduced height gene Rht22 (Xgwm471- 29.5 cM, Xgwm350-20.1 cM) (Peng et al., 2011). Similarly, the MTA identified for PH on chromosome 2B (104.59 cM) and 5D (1.58 cM) found in proximity to the reduced height genes Rht4 (Xwmc317-106 cM) (Ellis et al., 2005) and Rht23 (Xgdm63-4.7 cM, Xbarc110-11.1 cM) (Chen et al., 2015). We also observed an MTA on chromosome 2A where Rht7 gene had already been reported but at different map position. For two correlated traits PH and DTM (Mohibullah et al., 2012), a significant MTA (AX-94656878) was detected on chromosome 2A annotated bZIP transcription factor. In plants, these factors regulate genes in response to abiotic stress, seed maturation, flower development and pathogen defense (Jakoby et al., 2002). Similarly, several studies reported a moderate, but significant correlation between heading time and PH (Sultana et al., 2002; Mohibullah et al., 2012). The MTA AX-94941145 identified on chromosome 7A annotated probable LRR receptor-like serine/threonine-protein kinase At3g47570. Further, we investigated the possibilities of semi dwarfing genes on chromosome 4B and 4D which is present in Indian cultivars (Sheoran et al., 2013) but could not detect any MTA linked to these genes (Rht-B1b and Rht-D1b) suggesting either these genes were eliminated during filtering or may not reach significant threshold level. Similarly, Ain et al. (2015) did not find any MTA for semi dwarfing genes on chromosome 4B and 4D employing iSelect 90K SNP chip. Several MTAs for PH have been reported previously on chromosomes 2A (Ain et al., 2015; Mengistu et al., 2016), 2B (Zanke et al., 2014; Ain et al., 2015; Gao et al., 2015) and 7A (Gao et al., 2015; Soriano et al., 2017).

Wheat domestication genes Q, compactum (C), sphaerococcum (S1) related to spike morphology have been detected on chromosomes 5A, 2D, and 3D, respectively (Johnson et al., 2008). In the present study, eight loci were detected on chromosomes 1B, 2A, 3A, 3B, 3D, 5A, 7A, and 7B for SL which were partially consistent with those of Zhou et al. (2017), who reported QTLs for SL on chromosome 3A, 3B, 5A and with those of Ma et al. (2007), who reported genomic regions for SL on chromosome 1B and 7A. These results indicated that multiple loci having unequal effects can influence the variations in the SL. It is interesting to note that PH shared common significant loci (83.23 cM) with SL showing a high correlation between these traits in concurrence with the previous results (Sukumaran et al., 2014). Besides MTAs on chromosome 1B (discussed earlier), another SL associated SNP AX-94722223 (chromosome 5A) harboring actin related protein (ARPC3) which has been known to play a key regulator of cytoskeleton dynamics controlling multiple developmental processes in a variety of tissues and cell types (Qi et al., 2017). We expected genes contributing to variation in SL to be most strongly expressed within the different growth stages of spike. In fact, 5 putative candidate genes identified for SL showed expression FPKM >5 in spike tissue depicting highest expression (FPKM 143.65) at growth stage 65 (Zadok's Scale), corroborating its causal effect.

Awns were reported superior to the flag leaves on a cellular and physiological level during the grain filling period contributing 40–80% of the photosynthetic assimilates accumulated in the wheat grain (Li et al., 2006). Most significant MTAs for Awn\_L were reported on several chromosomes viz., 1A, 1B, 2B, 2D, 3B, 4A, 5B, 6D, and 7B (Wu et al., 2016; Liu Y. et al., 2018). However, some of the chromosomal regions associated with Awn\_L for instance 1.72 cM (7B) and 85.27 cM (3B) were detected for both Awn\_L and SL. This study further corroborated the result of Echeverry-Solarte et al. (2015), who reported a novel QTL for Awn\_L on chromosome 7B harboring two consistent loci associated to supernumerary spikelet (SS) and putative QTLs for PH, DTH, DTM making it an important loci for future studies. In the present investigation, cell elongation protein DIMINUTO predicted for Awn\_L associated SNP locus (AX-95025537 and AX-95012310) on chromosome 7B (**Supplementary Table S8**) have been implicated in regulating cell elongation (Takahashi et al., 1995). The genomic regions that contributed to Leaf\_L were found associated with chromosomes 4A and 5A with phenotypic variations ranging from 15.5 to 15.7%. In earlier reports several chromosomal regions viz., 2B, 3A, 4A, 4B, and 5A were detected for flag Leaf\_L (Jia et al., 2013; Wu et al., 2016). Positive and significant correlation between flag Leaf\_L and SL revealed their role in increasing yield (Wu et al., 2016).

Mining of superior/favorable alleles is essential for improving the complicated earliness trait in wheat using marker assisted selection. In recent years, association mapping has been widely used in exploring the elite alleles of many agronomic traits such as yield related traits (Sun et al., 2017), heading days and PH (Ain et al., 2015; Ogbonnaya et al., 2017) and water soluble carbohydrates (Dong et al., 2016) in wheat. In the present study,

the phenotypic effect value of the favorable alleles of DTH, PH, and SL was evaluated and inferred to have positive effect on the respective traits. The candidate genes and the SNPs linked with the economic important traits identified in this study could help in designing new strategies to hoard superior alleles for these key traits in future marker based breeding. Some novel regions identified in the present investigation could have been previously detected, but comparison of the positions of the SNPs linked to the respective traits was not possible due to the limitations of the various marker system used in different studies.

This study identified 146 MTAs for 23 agro-morphological traits (**Supplementary Table S9**), and putative candidate genes using the recently released genome sequence by IWGSC RefSeq v1.0 (Appels et al., 2018). MTAs specific to less explored traits such as awn length and glume pubescence were targeted for visualizing future needs of breeders in developing efficient and resilient wheat varieties. The chromosomal region controlling multiple traits were also identified which should pave the way for selection and may prove effective for pyramiding favorable alleles. Here we discovered novel candidate genomic regions together with previously reported genes which require further validation and testing in the wheat germplasm. Therefore, the significant MTAs identified having known candidate genes are being subjected to conversion as Kompetitive Allele Specific PCR (KASP) markers that can be efficiently used to transfer alleles into elite wheat genotypes (Rasheed et al., 2016). These useful genomic resources and PCR based markers (KASP markers) could be utilized for introgression of traits through marker assisted selection (MAS). These will strongly enhance systematic study of the genetics, comparative genomics and evolution of wheat, and will expedite isolation and characterization of genes controlling agronomically important traits, such as yield and abiotic stress.

### REFERENCES


### AUTHOR CONTRIBUTIONS

RT and DiK conceived the theme of the study. SS, DeK, NR, RuS, SusP, SJ, MI, AJ, NK, UA, and SurP did the computational analysis. SS, SJ, MI, RaS, PS, RT, and DiK drafted the manuscript. DeK, NR, RuS, CS, and AG did the phenotyping. NR and DeK contributed in wet lab work. SJ, MI, DiK, AR, GS, and RT edited the manuscript. All authors read and approved the final manuscript.

### FUNDING

This work was supported by Indian Council of Agricultural Research; Ministry of Agriculture and Farmer's welfare, Government of India by providing financial assistance in the form of CABin grant (F. No. Agril. Edn.4-1/2013-A&P) to ICAR-IASRI and ICAR-IIWBR.

### ACKNOWLEDGMENTS

The authors are thankful to Indian Council of Agricultural Research, Ministry of Agriculture and Farmer's welfare, Government of India for Advanced Supercomputing Hub for Omics Knowledge in Agriculture (ASHOKA) facility at ICAR-IASRI, New Delhi, India created under National Agriculture Innovation Project, funded by World Bank.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2019.00527/ full#supplementary-material



physiological traits in the chinese wheat cross zhou 8425B/chinese spring. Front. Plant Sci. 6:1099. doi: 10.3389/fpls.2015.01099


wheat (Triticum aestivum). Physiol. Plant. 127, 701–709. doi: 10.1111/j.1399- 3054.2006.00679.x


QTL meta-analysis. PLoS One 12:e0178290. doi: 10.1371/journal.pone.01 78290


collection of hexaploid landraces identifies a large molecular diversity compared to elite bread wheat. Plant Biotechnol. J. 16, 165–175. doi: 10.1111/pbi. 12757


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Sheoran, Jaiswal, Kumar, Raghav, Sharma, Pawar, Paul, Iquebal, Jaiswar, Sharma, Singh, Singh, Gupta, Kumar, Angadi, Rai, Singh, Kumar and Tiwari. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# High-Density Mapping of Triple Rust Resistance in Barley Using DArT-Seq Markers

Peter M. Dracatos<sup>1</sup> \* † , Rouja Haghdoust<sup>1</sup>† , Ravi P. Singh2,3, Julio Huerta Espino2,3 , Charles W. Barnes<sup>4</sup> , Kerrie Forrest<sup>5</sup> , Matthew Hayden<sup>5</sup> , Rients E. Niks<sup>6</sup> , Robert F. Park<sup>1</sup> and Davinder Singh<sup>1</sup>

<sup>1</sup> Plant Breeding Institute Cobbitty, Sydney Institute of Agriculture, The University of Sydney, Sydney, NSW, Australia, 2 International Maize and Wheat Improvement Center, Texcoco, Mexico, <sup>3</sup> Campo Experimental Valle de México, INIFAP, Chapingo, Mexico, <sup>4</sup> Instituto Nacional de Investigaciones Agropecuarias (INIAP), Quito, Ecuador, <sup>5</sup> Agriculture Victoria, AgriBio, Centre for AgriBioscience, La Trobe University, Melbourne, VIC, Australia, <sup>6</sup> Plant Breeding, Wageningen University & Research, Wageningen, Netherlands

#### Edited by:

Dragan Perovic, Julius Kühn-Institut, Germany

### Reviewed by:

Albrecht Serfling, Julius Kühn-Institut, Germany Belayneh A. Yimer, United States Department of Agriculture, United States

#### \*Correspondence:

Peter M. Dracatos peter.dracatos@sydney.edu.au †These authors have contributed equally to this work

#### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 30 January 2019 Accepted: 28 March 2019 Published: 26 April 2019

#### Citation:

Dracatos PM, Haghdoust R, Singh RP, Huerta Espino J, Barnes CW, Forrest K, Hayden M, Niks RE, Park RF and Singh D (2019) High-Density Mapping of Triple Rust Resistance in Barley Using DArT-Seq Markers. Front. Plant Sci. 10:467. doi: 10.3389/fpls.2019.00467 The recent availability of an assembled and annotated genome reference sequence for the diploid crop barley (Hordeum vulgare L.) provides new opportunities to study the genetic basis of agronomically important traits such as resistance to stripe [Puccinia striiformis f. sp. hordei (Psh)], leaf [P. hordei (Ph)], and stem [P. graminis f. sp. tritici (Pgt)] rust diseases. The European barley cultivar Pompadour is known to possess high levels of resistance to leaf rust, predominantly due to adult plant resistance (APR) gene Rph20. We developed a barley recombinant inbred line (RIL) population from a cross between Pompadour and the leaf rust and stripe rust susceptible selection Biosaline-19 (B-19), and genotyped this population using DArT-Seq genotyping by sequencing (GBS) markers. In the current study, we produced a high-density linkage map comprising 8,610 (SNP and in silico) markers spanning 5957.6 cM, with the aim of mapping loci for resistance to leaf rust, stem rust, and stripe rust. The RIL population was phenotyped in the field with Psh (Mexico and Ecuador) and Ph (Australia) and in the greenhouse at the seedling stage with Australian Ph and Pgt races, and at Wageningen University with a European variant of Psh race 24 (PshWUR). For Psh, we identified a consistent field QTL on chromosome 2H across all South American field sites and years. Two complementary resistance genes were mapped to chromosomes 1H and 4H at the seedling stage in response to PshWUR, likely to be the loci rpsEm1 and rpsEm2 previously reported from the cultivar Emir from which Pompadour was bred. For leaf rust, we determined that Rph20 in addition to two minor-effect QTL on 1H and 3H were effective at the seedling stage, whilst seedling resistance to stem rust was due to QTL on chromosomes 3H and 7H conferred by Pompadour and B-19, respectively.

Keywords: high-density linkage map, DArT-Seq markers, rust resistance, QTL, barley

## INTRODUCTION

Plant pathogens of the Puccinia genus are some of the most feared and damaging diseases of cereal crops (Dean et al., 2012). Since Biblical times, cereal rust pathogens have plagued farmer's fields causing significant yield losses and in severe cases crop failure and famine (Kislev, 1982). Barley is the fourth most important cereal crop in the world and is mainly used for malt production, animal

feed and, in some regions, human consumption. Three main rust diseases currently threaten barley production: wheat stem rust, barley leaf rust and barley stripe rust, caused by Puccinia graminis f. sp. tritici (Pgt), P. hordei (Ph), and P. striiformis f. sp. hordei (Psh), with the disease abbreviated hereafter as WSR, BLR, and BYR, respectively. All three rust diseases affect malting quality through reductions in kernel plumpness, kernel weight, and germination, resulting in economic losses to producers as premiums are paid for malting grade barley (Roelfs, 1978; Dill-Macky et al., 1990; Roelfs et al., 1992; Steffenson, 1992).

BYR, caused by Psh, is usually a problem in cooler, wetter climates that often prevail at higher altitudes, and it is therefore renowned as a cold-temperature rust disease. Despite widespread global crop losses Psh has not colonized all barley growing regions, predominantly due to geographic isolation (Dantuma, 1964; Macer and Van den Driessche, 1966; Nover and Scholz, 1969; Chen et al., 1995; Chen, 2007). While Psh has not been detected in Australia, offshore testing of Australian barley cultivars in greenhouse seedling tests or in field adult plant tests suggest between 60 and 70% are susceptible to this pathogen (Dracatos P.M. et al., 2015; Wellings, 2007). This implies a Psh incursion for Australia poses a significant threat to the Australian barley industry. In contrast, leaf rust (BLR) occurs globally and is frequently detected on a seasonal basis and can cause up to 60% yield losses in susceptible varieties (Park et al., 2015). The Ph rust fungus has systematically evolved virulence for widely deployed leaf rust resistance genes, typically via stepwise mutation, or in regions where the alternate host (Ornithogalum umbellatum) is present via sexual recombination (Wallwork et al., 1992). Despite their comparatively infrequent occurrence, epidemics of WSR usually have devastating effects for both wheat and barley production (Park, 2007). Due to the involvement of the same causal pathogen as wheat (Pgt), early sown wheat and Triticale crops threaten late-sown barley. Furthermore, studies suggest that globally >95% of barley accessions are susceptible to the widely publicized virulent races of Pgt derived from Eastern Africa such as the Ug99 lineage (TTKSK) (Steffenson et al., 2017).

Genetic resistance is the most environmental friendly and economically efficient control method to reduce yield losses due to rust diseases. In contrast to wheat, where more than 80 cataloged stripe rust resistance genes exist, fewer stripe rust resistance genes have been characterized for barley. Despite this, numerous genetic studies have been performed to characterize the inheritance of stripe rust resistance in barley, especially that present in the 12 international standard differential barley genotypes (Nover and Scholz, 1969; Chen and Line, 1992). Diverse inheritance patterns mainly involving recessive resistance genes were reported by Chen and Line (1999) among the 12 international standard differential barley genotypes. They observed varying inheritance patterns including: two complementary recessive genes (Emir, Varunda, and Trumpf), two independent recessive genes (Trumpf), and single recessive (BBA 2890, Grannelose Zweizeilige, I5 and PI 548708) or both recessive and dominant genes (Abyssinian 14 and Stauffers Obersulzer). A similar situation also exists for stem rust, where there is a lack of available resistance that can be effectively deployed in barley breeding programs. The cloned rpg4/Rpg5 gene complex is the only known resistance effective against the Ug99 lineage, although virulent races exist within North America. More recently, other resistances have been identified and characterized (Steffenson et al., 2017). In a very recent study, WSR adult plant resistance (APR) genes Rpg2 and Rpg3 were mapped in barley on chromosomes 2H and 5H, respectively (Case et al., 2018). Recent WSR epidemics in Sicily (Patpour et al., 2015) and Germany (Olivera Firpo et al., 2017) have highlighted the importance of diversifying resistance in barley germplasm, and also emphasize that Ug99 is not the only threat to cereal production.

In contrast, BLR resistance in barley is widely available, including 26 designated resistance genes and numerous QTLs for partial resistance (Qi et al., 1998, 1999; Park et al., 2015; Kavanagh et al., 2017; Yu et al., 2018). As previously mentioned, there are numerous examples of resistance gene breakdown (mainly those with major phenotypic effect) due to dynamic and rapid evolution in prevailing pathogen populations. Only a few leaf rust resistance genes have remained durable across different regions/environments, one of which is the APR gene Rph20. Rph20 was first identified as a QTL (Rphq4) in the Dutch barley cultivar Vada (Qi et al., 1998) and was later characterized in Pompadour (Golegaonkar et al., 2010; Liu et al., 2011) and the Australian cultivar Flagship (Hickey et al., 2011) where it was mapped to chromosome 5HS (Hickey et al., 2011; Liu et al., 2011). Rph20 was found to be present at high frequency in European barley germplasm and expressed as early as the third leaf stage (Wang et al., 2010; Singh et al., 2013). However, in some accessions with very high levels of APR, the resistance was hypothesized to be due to Rph20 and the presence of a 2nd or 3rd genetic component (Golegaonkar et al., 2010; Wang et al., 2010; Singh et al., 2013; Rothwell et al., 2019).

Recent genomic advancements in barley have improved the ability to develop physical scaffolds and utilize sequence information for marker development. Nevertheless, whole genome sequencing of populations and/or multiple accessions for non-model crop species with large genomes such as barley is still relatively expensive. Genotyping-by-sequencing (GBS) has been used as an alternative to whole genome sequencing due to: 1/lower cost, 2/generation of high quality genetic markers, and 3/suitability of the markers for genomic prediction/selection. DArT-SeqTM technology combines the DArT array hybridization complexity reduction method (Wenzl et al., 2004) with next generation sequencing and can be optimized for any species. DArT-Seq has been used across numerous crop species for genetic diversity analysis (Baloch et al., 2017), genome-wide association studies (GWAS) (Singh et al., 2018; Visioni et al., 2018) and QTL mapping (Haghdoust et al., 2018). We have previously reported the utilization of DArT-Seq to map leaf rust (Dracatos et al., 2014; Singh et al., 2017, 2018) and stripe rust (Dracatos P.M. et al., 2015; Haghdoust et al., 2018) resistance in barley using marker-trait association and QTL mapping. In these studies, both the genetic and physical position of each marker was determined based on the Bowman consensus map and Morex physical reference assembly, respectively. To overcome possible differences in gene/marker order between the parents of the mapping population with the reference, and to improve

the accuracy of mapping genes of interest, the use of traitspecific genetic maps is the preferred approach. In this study, we constructed a high-density linkage map using DArT-Seq markers for the Pompadour x Biosaline-19 (P/B-19) RIL population and precisely mapped QTLs for resistance to stripe, leaf and stem rust across different developmental stages and environments.

### MATERIALS AND METHODS

### Plant Material

To study the inheritance of rust resistance at the seedling and adult plant stage, we used the F9-derived recombinant inbred line (RIL) Pompadour x Biosaline-19 (P/B-19) mapping population (98 RILs) developed at the Plant Breeding Institute, University of Sydney as described by Haghdoust et al. (2018). B-19 is widely susceptible to numerous P. striiformis formae speciales including Psh, as well as Ph at all developmental stages. In contrast, Pompadour is a French two-rowed feed spring barley that was previously determined to carry leaf rust resistance due mainly to the presence of Rph20 (Golegaonkar et al., 2010; Liu et al., 2011) but is also resistance to stem and stripe rust.

### Greenhouse Inoculation and Phenotypic Analysis for Stripe Rust Resistance

Seedlings were grown and maintained in plant boxes as described in Niks et al. (2015) with the following exceptions. Following inoculation, the plant boxes were transferred to a dark dew chamber overnight, at 10◦C. The rust susceptible B-19 parent and Dutch cultivar Vada were included in every tray along with the resistant parent Pompadour. Inoculation took place in a settling tower as described by Eyal et al. (1968) using a 1:12 mixture of urediniospores:Lycopodium spores with 3 mg of urediniospore being used to inoculate each tray. One isolate, Wageningen-derived Psh race 24 (PshWUR), was tested in two consecutive experiments. The responses of 10 barley genotypes at the seedling stage determined that PshWUR was virulent on Topper, Astrix, Atem, Berac and the susceptible control Vada, but avirulent on Agio, Bigo, Emir, Mazurka, Hiproly, Abed Binder, and Trumpf. Phenotypes were scored on a 0–4 scale, where those RILs with infection types (ITs) equal or greater than 3 were deemed susceptible and ITs 0–2 were deemed resistant.

### Phenotyping for BLR and WSR Resistance

Both seedling and field screening was performed as described by Singh et al. (2017). Only one Ph pathotype 5457 P+ (virulent on Rph1, Rph2, Rph3, Rph4, Rph6, Rph9, Rph10, Rph12, and Rph19) was used to inoculate the mapping population in the greenhouse at the seedling stage and in the field over two different seasons (2016 and 2 replicates in 2018) at Plant Breeding Institute, Cobbitty, NSW, Australia. Seedlings of the P/B-19 mapping population were inoculated and ITs recorded 10 and 12 days post inoculation using the 0–4 scale (Park and Karakousis, 2002), while the P/B-19 RILs were assessed in the field when the susceptible spreader control "Gus" reached a rating of 100S using a modified Cobb scale (Peterson et al., 1948) and 1–9 McNeal scale. The inoculation and disease assessment for stem rust resistance in the greenhouse at the seedling stage was performed using Pgt race 98-1,2,3,5,6 as described by Dracatos P. et al. (2015).

### Assessments of BYR Resistance in Ecuador

The P/B-19 RIL population was assessed for response to BYR in 2017 in Ecuador at the Instituto Nacional de Investigaciones Agropecuarias – INIAP station near Quito. The RILs were sown in 1 m × 1 m blocked groups and each block contained six lines in 1-m-long rows, spaced equally within the 1 m block. Each block was spaced roughly 30 cm apart. Five blocks were sown between susceptible spreader rows containing equal proportions of Shayari 89, Shayari 2000, and other local susceptible barley lines (including B-19). Spreader rows were not artificially inoculated. Each row of five blocks was spaced 1 m from the next row of blocks. Two evaluators scored each RIL simultaneously. The RILs were evaluated 70, 81, and 98 days later for disease severity according to the modified Cobb scale (Peterson et al., 1948).

### Phenotyping for BYR at Toluca, Mexico

The phenotyping for resistance to BYR was conducted at CIMMYT's research station near Toluca (2640 mask, 18◦N latitude), Mexico during the summer season coinciding with cooler conditions and high rainfall. The field plots of lines consisted of 1 m paired rows sown with about 60–80 seeds on top of 0.8 m wide raised beds. A susceptible spreader, variety Apizaco 36, was sown around the experimental field and as hills in the middle of the 0.5 m wide alleys on one side of each plot to allow uniform disease development. Greenhouse increased fresh urediniospores of Mexican variant of race 24 (PshMEX-1) of Psh were suspended in Soltrol 170 oil and sprayed onto about 1 month old spreaders. The differential response on 10 barley genotypes at the seedling stage determined that PshMEX-1 was virulent on Topper, Cambrinus, Mazurka, Varunda, Emir, Heils Franken, Abed Binder, and Trumpf, but avirulent on Bigo, I5 and the bread wheat cultivar Morocco (Sandoval-Islas et al., 1998).

BYR severity was recorded twice according to the modified Cobb scale (Peterson et al., 1948), when the severity on the susceptible control Kaputar reached approximately 60 and 100%, respectively. In addition, the host responses to infection were also determined according to Roelfs et al. (1992).

### DArT-Seq Marker Genotyping and Genetic Map Construction

Genotyping-by-sequencing (GBS) data was generated using the DArT-Seq platform (DArT PL, Canberra, NSW, Australia) as described on the company website<sup>1</sup> . Marker sequences were aligned against the Morex barley genome assembly (Mascher et al., 2017) using the sequence aligner Nuclear (Gydle Inc., Bioinformatics Service, Quebec City, QC, Canada) with three mismatches allowed.

<sup>1</sup>https://www.diversityarrays.com/

Both dominant and co-dominant markers with at least 70% call rate were considered for map construction. A preliminary map was constructed in R/ASMap (Taylor and Butler, 2017) using the Kosambi mapping function at LOD 6. R/ASMap was used to identify and remove redundant markers, rectify markers with switched alleles, remove duplicate samples and lines with high (≥30%) missing data, a high number of crossovers, or high (≥10%) heterozygosity, the latter also being an indicator of a mixed sample. Markers with segregation distortion (p-value <0.01 calculated from a χ 2 test) that had inconsistent ordering compared to the Morex genome assembly and unlinked markers were excluded. Distorted markers were not automatically discarded as some linkage groups showed regions with high density of low-level distortion, presumably with biological significance. Following marker and sample filtering, the map was reordered in R/ASMap at LOD 8.

### QTL Mapping in the P/B-19 RIL Population

We used the same genotypic dataset previously described in Haghdoust et al. (2018). A high-density linkage map was constructed comprising 8,610 markers (SNPs and in silico DArT-Seq markers). In contrast to Haghdoust et al. (2018), we selected markers every 10–15 cM based on genetic positions from the newly constructed linkage map spanning >5,000 cM. IT data from the greenhouse for leaf, stem, and stripe rust were converted to binary scores and mapped using composite interval mapping (CIM). Field-based data for leaf and stripe rust from each disease nursery were mapped using quantitative disease measurements (either Cobb 0–100 or McNeal 1–9). Additionally, phenotypic data for seedling resistance to the barley grass stripe rust pathogen (P. striiformis f. sp. pseudo-hordei) from Haghdoust et al. (2018) was also used for QTL analysis using the new genetic map with the aim of comparing the position of stripe rust resistance QTLs. QTL regions associated with resistance to different Psh isolates in the P/B-19 RIL population were considered to be the same if there was overlap between their LOD-1 support intervals.

### RESULTS

### Greenhouse and Field Psh Inoculations and Inheritance of Resistance

In greenhouse phenotypic assessments at the seedling stage with the PshWUR isolate, Pompadour was highly resistant (IT = ";C") and B-19 was very susceptible (IT = "4") based on the "0–4" IT scale described in Haghdoust et al. (2018). To determine the genetic basis of resistance in the Pompadour parent, we screened the P/B-19 RIL population at the seedling stage in the greenhouse in two separate experimental replicates (**Figure 1A**). Across both experimental replicates, >68% of the RILs were highly susceptible (IT = "3+" to "4"). Three phenotypic responses were observed in the P/B-19 RIL population, viz. highly resistant similar to Pompadour, intermediate (restricted pustules) and fully compatible similar to

the susceptible parent B-19 (**Figure 1A**). Chi Squared analysis suggested that the segregation ratio best fitted a 1R : 3S ratio (χ <sup>2</sup> = 0.154 at 1 d.f.; p = 0.695), supporting the presence of two complementary resistance genes conferring seedling resistance to PshWUR.

At the Ecuador field site, Pompadour and B-19 had a mean disease rating of 2.5 and 90% disease severity, respectively. Very similar results were observed in Mexico in 2015 and 2016 seasons. The frequency distribution histogram derived from field assessment data from Ecuador in 2017 was clearly skewed toward susceptibility, whilst data from Toluca in 2015 and 2016 seasons were also skewed toward susceptibility but were a closer fit to a normal distribution, respectively (**Figure 2A**).

### Inheritance of Resistance to Ph and Pgt

In response to Ph pathotype 5457 P+, Pompadour gave an intermediate response of ";1+C" at the seedling stage, and was highly resistant in the field (1R). In contrast, the B-19 parent was highly susceptible ("3+" and 90S, respectively) (**Figure 1B**). Barley leaf rust resistance in the RIL population segregated bimodally (1R : 1S) and was simply inherited at both the seedling (χ <sup>2</sup> = 0.013 at 1 d.f.; p = 0.909) and adult plant stages (2016 and 2018) (**Figures 2B,C**), with the segregation

Morocco 12 dpi.

pattern best fitting monogenic inheritance which was highly correlated to the presence of the Rph20 marker bPb-0837. In contrast to the phenotypic response to both leaf rust and stripe rust, both parental genotypes were resistant (Pompadour IT = "0;" and B-19 IT ";12C") to Pgt race 98-1,2,3,5,6 at the seedling stage in the greenhouse. Pompadour was highly resistant and B-19 showed an intermediate response as shown in **Figure 1C**. The segregation for resistance in the P/B-19 mapping population best fitted a two-gene model (χ <sup>2</sup> = 0.974 at 1 d.f.; p = 0.324), suggesting the involvement of resistance from both parents.

### Genetic Map Construction

A total of 18,062 DArT-Seq markers showing polymorphism in the P/B-19 RIL population were considered for genetic map construction. Linkage map construction was performed using ASmap LOD8 following marker curation. Markers that were redundant, had significant segregation distortion (p < 0.01) and/or low call rate (<70%), were removed as were duplicated samples, or those with missing data (>70%) (**Supplementary Table S1**). Following quality filtering, the final map consisted of 1,596 codominant SNP and 7,014 dominant DArT-Seq markers that spanned 5,957.6 cM across seven linkage groups corresponding to chromosomes 1H-7H (**Supplementary Table S1**). Approximately 50% of the markers could not be unambiguously assigned to a physical position in the 2017 Morex reference genome assembly; in 123 cases this was due to the presence of >2 map matches presumably due to paralogous gene families. In contrast, 4,220 markers were mapped to unique physical positions in the Morex genome. However, on numerous occasions the position determined in the P/B-19 linkage map did not correspond to that of the Morex reference. Numerous linkage blocks were identified during map construction, evidenced by the abundance of redundant markers. Nevertheless 4,220 markers could be mapped to unique physical positions in the Morex genome. Numerous regions in the linkage groups contained multiple redundant markers. Twenty-three such regions had >50 cosegregating markers including a region on chromosome 3H that had 233 redundant markers (**Supplementary Table S1**). This could be indicative of regions of the genome with repressed recombination or possibly an introgression from a wild Hordeum spp.

### QTL Analysis for Rust Resistance

The high-density linkage map was used to precisely map QTL for rust resistance segregating in the P/B-19 mapping population. A total of 10 QTLs (exceeding the LOD threshold of 3) were mapped with distinct chromosomal locations for resistance to the four different rust diseases (leaf, stem, and stripe rust caused by both barley and barley grass-adapted formae speciales of P. striiformis) (**Table 1**). The LOD scores ranged from 3.11 to 17.45 for all QTLs and the percentage of variation explained by individual QTLs ranged from 11 to 36%, but was mostly lower than 20%, indicating that the mapped rust resistance at the seedling and adult plant stages were due to genes with both small and large effect (**Table 1** and **Figure 3**).

For BYR resistance using the PshWUR isolate, the IT data from both seedling greenhouse experiments were highly correlated (r = 0.8) and distinct phenotypic classes (of resistant and susceptible) were easily distinguished and, therefore qualitative

values for parents are indicated by arrows.

assessments were taken from all RILs for the purpose of QTL mapping. We identified two consistent QTLs for resistance at the seedling stage on chromosomes 1H and 4H, both contributed by Pompadour (**Table 1** and **Figure 3**). For both experiments, the mapping resulted in identical LOD profiles (**Supplementary Figure S1A**). The infection level of the P/B-19 RILs to BYR at the adult plant stage at field sites in both Mexico and Ecuador was assessed quantitatively and the segregation was continuous but skewed toward susceptibility. In 2016 at Toluca, Mexico, the same 1H QTL was identified as in the two greenhouse seedling experiments using both PshMex-1 and PshWUR isolates, respectively, while in 2015 (Toluca, Mexico) a signal with LOD 2.8. This QTL was not detected in 2017 from the Ecuadorderived dataset (**Figure 3** and **Supplementary Figure S1A**). Consistent QTL on chromosome 2H were identified across all field environments, but was not identified in the greenhouse in response to the PshWUR isolate.

We also phenotyped the P/B-19 RIL population and mapped QTL for resistance to both leaf and stem rust. For leaf rust, we phenotyped the population in both the greenhouse under controlled conditions and in the field at Plant Breeding Institute Cobbitty over three successive seasons at the adult plant stage using the same Ph pathotype (5457 P+). We identified three QTLs for resistance on chromosomes 1H, 3H, and 5H at the seedling stage based on two highly correlated (r = 0.9) experimental replicates. From the field data, in all instances, we mapped QTL on the short arm of chromosome 5H corresponding to the position of Rph20, while an additional QTLs were also mapped only for the 2016 data on the short arm of chromosome 2H. The major-effect leaf rust QTLs identified on chromosome 5H are all in close proximity or co-locate with the bPb-0837 marker that is associated with the Rph20 resistance. The 1H leaf rust QTL was only effective at the seedling stage and was located near the centromere (**Figure 3**, **Table 1**, and **Supplementary Figure S1B**). For stem rust, two minor effect QTL (LOD score <5) for seedling resistance mapped on chromosomes 3H and 7H based on two experimental replicates (r = 0.7) (**Figure 3**, **Supplementary Table S1**, and **Supplementary Figure S1C**).

Co-location of QTLs against multiple rust pathogen species was identified on four different chromosomes, viz. 1H, 3H, 5H, and 7H. We remapped the phenotypic data reported in Haghdoust et al. (2018) for barley grass stripe rust resistance at the seedling stage using the genetic map created in the present study (**Figure 3**). The same QTL on chromosomes 1H, 3H, 5H, and 7H identified in Haghdoust et al. (2018) were also identified using the high-density genetic map. The chromosome 1H QTL co-located with the broadly effective QTL identified for BYR resistance, while the 3H and 7H QTLs co-located with both stem rust resistance loci mapped in this study (**Figure 3**).

### DISCUSSION

The accuracy and speed taken to characterize disease resistance genes in plants has increased due to advancements in genomics, especially through cheaper sequencing technologies. GBSderived complexity reduction methodologies such as DArT-Seq generate thousands of polymorphic markers that can be used to expand the size of linkage groups. We previously mapped both leaf rust (Dracatos et al., 2014; Kavanagh et al., 2017) and stripe rust (Dracatos P.M. et al., 2015) resistance in


TABLE 1 | Summary of QTLs for rust resistance identified in the Pompadour × Biosaline-19 RIL population.

barley and developed a high-density linkage map in wheat to map both components of complementary stripe rust resistance (Dracatos et al., 2016) using the DArT-Seq marker platform. To enable precise mapping of QTLs for resistance to stripe rust, leaf rust and stem rust derived from the European malting barley cv. Pompadour, we constructed a linkage map for P/B-19 RIL population reported in Haghdoust et al. (2018) using 8,610 polymorphic DArT-Seq and SNP markers spanning >5,000 cM.

The ability of cereal rust pathogen spores to migrate both within and between continents and hence adapt to new environmental conditions and host genotypes continue to provide challenges to cereal production (Brown and Hovmøller, 2002; Ali et al., 2014). In cases where phenotypic analysis has to be performed offshore at international field and greenhouses because a pathogen is not present, as is the case for Psh in Australia, using a genetic map with high marker density to resolve the location of the R gene/QTL in the first instance can enhance the efficiency of subsequent pre-emptive breeding efforts. Phenotyping and genetic analysis for BYR resistance in the greenhouse determined that Pompadour likely carries two complementary resistance genes effective at the seedling stage in response to inoculations with PshWUR. CIM analysis identified the presence of two QTLs for stripe rust resistance at the seedling stage on chromosome 1H and 4H in both greenhouse experiments. We included differentials Emir and Agio in our tests and determined both shared a similar disease response to PshWUR as the Pompadour parent. Pedigree analysis of Pompadour parent revealed that it was derived from Emir through Agio, suggesting that the observed seedling resistance is likely rpsEm1 and rpsEm2 genes (**Figure 4**). Further intercrossing and subsequent rust testing is required to truly confirm the complementary gene hypothesis, following the procedures used in wheat by Dracatos et al. (2016) to map the complementary stripe rust resistance genes Yr73 and Yr74. We also assessed the P/B-19 mapping population at high altitude cooler regions in Ecuador and Mexico under field conditions over three successive seasons. The same 1H QTL was also mildly effective in Toluca, Mexico in 2016 (not significant in 2015, LOD = 2.78) at the adult plant stage in the field, suggesting it may be interactive with the 2H QTL. In contrast to the PshWUR isolate we used for mapping of seedling resistance, the isolate used in Toluca (Mex-1) was virulent with respect to the Emir (rpsEm1 and rpsEm2) resistance. Resistance to PshWUR was previously mapped by Niks et al. (2015) on chromosome 4H in the Vada x L94 mapping population and was contributed by the L94 parent and believed to be derived from the Ethiopian accession Grannelose Zweizeilige. We did not identify the 4H QTL from the data from Mexico or Ecuador, suggesting it is either ineffective to the Mex-1 isolate and Ecuadorian Psh population, or only effective at the seedling stage. A similar result was previously observed for the Lr12/Lr31 wheat leaf rust complementary gene system where only Lr12 conferred resistance in adult plants (Singh et al., 1999).

Stripe rust resistance genes have previously been reported in the telomeric regions of the short arm of chromosome 1H in barley were associated with the Mla locus that is renowned to harbor highly divergent R genes effective to powdery mildew in barley and in wheat stem rust (Sr33 and Sr50) (Periyannan et al., 2013; Mago et al., 2015). Two recent studies have reported on a barley QTL in the same region conferring resistance to isolates of P. striiformis adapted to brome grass, barley-grass, and wheat, suggesting the presence of a cluster of resistance genes consistent with the MLA hypothesis. However, the presence of a broadly effective resistance gene cannot be ruled out (Niks et al., 2015; Kamino et al., 2016; Haghdoust et al., 2018). We re-mapped the 1H QTL for seedling and adult plant resistance to the barley grass adapted formae speciales of P. striiformis (which causes barley grass stripe rust – BGYR) reported by Haghdoust et al. (2018) in the P/B-19 using our high density linkage map and determined it also co-locates with the BYR seedling 1H QTL mapped in the present study, suggesting they are likely the same gene or tightly linked genes.

In this study, we mapped a broadly effective field-based resistance QTL on chromosome 2H. Further testing is required to determine if this resistance is also expressed at the seedling stage using the same Mexican and Ecuadorian isolates that occurred in the field. Field resistance to BYR has been mapped previously on chromosome 2H in four separate studies using QTL mapping [Shayari/Gardner (Toojinda et al., 2000) and BCD47/Baronesse (Vales et al., 2005) DH populations] and GWAS (Gutierrez et al., 2015; Visioni et al., 2018) approaches, respectively. Further genetic studies are required to determine whether these loci are distinct or the same. Interestingly, the 2H QTL identified in Gutierrez et al. (2015) was also effective under Ecuadorian field conditions, as was the 2H QTL derived from Baronesse (Vales et al., 2005) and may have also been derived from the European material included in their study likely suggesting the involvement of the same gene. The lack of common marker types between genetic maps prevents comparative mapping analysis. Most recently, Visioni et al. (2018) used the same marker platform (DArT-Seq) and performed a GWAS on a diverse collection of European, Australian and American (North and South) barley accessions to map stripe rust resistance in response to Indian Psh races. The Visioni et al. (2018) study also mapped the same QTL on chromosome 2H as reported here and in previous studies, suggesting that it might be a broadly effective and hence valuable source of

stripe rust resistance providing broad-spectrum resistance under diverse environments.

Numerous barley accessions have been reported to carry the Rph20 resistance, however, most are highly susceptible as seedlings using oil-based inoculation methods (Golegaonkar et al., 2010). Two previous studies using different P. hordei isolates and inoculation methods determined that (Rphq4 – Rph20) was expressed from the 3rd leaf stage to maturity (Wang et al., 2010; Singh et al., 2013). In contrast, Wang et al. (2010) determined that a second QTL for partial resistance, Rphq2 on chromosome 2HL, prolongs latent period at the seedling stage but has almost no effect on disease resistance in adult plants. We identified three QTLs for leaf rust resistance at the seedling stage, on chromosomes 1H, 3H, and 5H. The presence of the bPb-0837 marker allele was strongly correlated with resistance in the P/B-19 RIL population, suggesting that the 5HS QTL detected is likely Rph20, however, the 1H QTL appears to be an uncharacterized QTL only expressed at the seedling stage. Our data suggests that in the Pompadour background, Rph20 is expressed earlier than previously determined. Relative to the Australian cultivar Flagship, numerous European varieties including Pompadour are more resistant (near immune) to leaf rust under field conditions under high inoculum pressure (Golegaonkar et al., 2010). Whether this is due to additional QTL for partial resistance (as observed in Vada), allelic variation at the Rph20 locus or variation in the pathogen population is yet to be determined. Our field data suggest that despite two experiment-specific minor effect QTLs, Rph20 was consistently detected across all three replicates. Whether this indicates that the Pompadour Rph20 allele confers stronger resistance or that other genetic components could not be effectively phenotyped due to environmental variation or inoculum pressure is unknown. Once Rph20 is cloned, further studies of allelic variation and their phenotypic effect will be possible, and will be relevant to breeding programs and evolutionary studies.

Two cloned resistance genes (Rpg1 and rpg4/Rpg5) have proved the most important to date for durable protection to stem rust in barley (Steffenson et al., 2017). Although rpg4/Rpg5 is still widely effective to most global Pgt isolates, virulence for Rpg1 is now common, and in some countries such as Australia, virulence is thought to be fixed. In this study, we were successful in mapping two QTLs for resistance to stem rust at the seedling stage on chromosomes 3H and 7H from greenhouse, derived from each of experiments using the parents of the P/B-19 RIL populations. A very recent study mapping QTL for resistance to stem rust in barley mapped both Rpg2 and Rpg3 on chromosomes 2H and 5H, respectively (Case et al., 2018). The same study identified multiple colocating QTL, and a single QTL, on chromosomes 3H and 7H, respectively, in response to North American isolates of Pgt at the seedling stage (Case et al., 2018). Due to the lack of common marker types the relationship between these QTL is unknown. However, the 7H QTL identified in the present study appears to provide a high level of protection at the seedling stage and is potentially a valuable source of resistance for varietal improvement.

DArT-Seq is a high-throughput complexity reduction based GBS technology enhancing genetic capabilities including: highdensity linkage map construction, GWAS, QTL mapping and genetic diversity analysis. In summary, we used DArT-Seq genotypic data to generate a high-density linkage map in the barley RIL population P/B-19 to precisely map resistance QTL to stripe, leaf, and stem rust pathogens. We determined Pompadour was a rich source of resistance to all three rust species. Further use of DArT-Seq marker sequence information is ongoing for marker development for the selection of triple rust resistance in barley. In the era of rapid gene cloning methodologies, future experiments to isolate the chromosome 2H and 4H QTLs (likely in a region of low recombination) should adopt a mutant sequencing approach as recently described for barley leaf rust resistance gene Rph1 (Dracatos et al., 2019) to search for candidate gene(s) within the defined intervals confirmed in this study.

### AUTHOR CONTRIBUTIONS

PD, RP, and DS conceptualized the work and designed the project. PD wrote the manuscript with contributions from RH, which was read, edited and approved by all co-authors. PD and RH performed the genetic and QTL analysis. PD, RH, DS, JHE, RN, RS, and CB performed the phenotypic analysis. MH and KF performed the genetic map construction.

### FUNDING

Grains Development Research Corporation funded the research and many of the researchers on the manuscript. RP holds the Judith and David Coffey Chair of Sustainable Agriculture.

### ACKNOWLEDGMENTS

The authors thank Matthew Williams, Bethany Clark, Anton Vels, and Javier Noroña for valuable technical assistance and the Grains Research Development Corporation for funding this work.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2019.00467/ full#supplementary-material

FIGURE S1 | LOD trace files for (A) stripe [green and black = WUR greenhouse rep 1 and 2, dark blue, red, and black dash = Mexico and Ecuador field sites and light blue = P. striiformis f. sp. pseudo-hordei], (B)leaf [pink = greenhouse PBI Cobbitty, dark blue dash = Field\_PBICobbitty\_2018rep1, light blue = Field\_PBICobbitty\_2018rep2, black dashed = FieldPBICobbitty\_2016] and (C) stem rust [red dashed = greenhouse PBI Cobbitty] resistance QTLs identified in the Pompadour × Biosaline-19 RIL population.

TABLE S1 | Summarised marker information for the high-density linkage map constructed for the Pompadour/Biosaline-19 RIL population.

### REFERENCES

fpls-10-00467 April 25, 2019 Time: 16:15 # 10



resistance to barley stripe rust. Theor. Appl. Genet. 111, 1260–1270. doi: 10. 1007/s00122-005-0043-y


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Dracatos, Haghdoust, Singh, Huerta Espino, Barnes, Forrest, Hayden, Niks, Park and Singh. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Development of Genome-Wide SNP Markers for Barley via Reference-Based RNA-Seq Analysis

Tsuyoshi Tanaka1,2,3, Goro Ishikawa<sup>4</sup> , Eri Ogiso-Tanaka<sup>5</sup> , Takashi Yanagisawa<sup>6</sup> and Kazuhiro Sato<sup>7</sup> \*

<sup>1</sup> Breeding Informatics Research Unit, Division of Basic Research, Institute of Crop Science, National Agriculture and Food Research Organization (NARO), Tsukuba, Japan, <sup>2</sup> Bioinformatics Team, Advanced Analysis Center, National Agriculture and Food Research Organization (NARO), Tsukuba, Japan, <sup>3</sup> Advanced Agricultural Technology and Sciences, Graduate School of Life and Environmental Sciences, University of Tsukuba, Tsukuba, Japan, <sup>4</sup> Breeding Strategies Research Unit, Division of Basic Research, Institute of Crop Science, National Agriculture and Food Research Organization (NARO), Tsukuba, Japan, <sup>5</sup> Soybean and Field Crop Applied Genomics Research Unit, Division of Field Crop Research, Institute of Crop Science, National Agriculture and Food Research Organization (NARO), Tsukuba, Japan, <sup>6</sup> Wheat and Barley Breeding Unit, Division of Wheat and Barley Research, Institute of Crop Science, National Agriculture and Food Research Organization (NARO), Tsukuba, Japan, <sup>7</sup> Group of Genome Diversity, Institute of Plant Science and Resources, Okayama University, Okayama, Japan

#### Edited by:

Laurent Gentzbittel, National Polytechnic Institute of Toulouse, France

#### Reviewed by:

Martin Mascher, Leibniz-Institut für Pflanzengenetik und Kulturpflanzenforschung (IPK), Germany Kentaro K. Shimizu, University of Zurich, Switzerland

> \*Correspondence: Kazuhiro Sato kazsato@okayama-u.ac.jp

#### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 30 January 2019 Accepted: 17 April 2019 Published: 10 May 2019

#### Citation:

Tanaka T, Ishikawa G, Ogiso-Tanaka E, Yanagisawa T and Sato K (2019) Development of Genome-Wide SNP Markers for Barley via Reference- Based RNA-Seq Analysis. Front. Plant Sci. 10:577. doi: 10.3389/fpls.2019.00577 Marker-assisted selection of crop plants requires DNA markers that can distinguish between the closely related strains often used in breeding. The availability of reference genome sequence facilitates the generation of markers, by elucidating the genomic positions of new markers as well as of their neighboring sequences. In 2017, a high quality genome sequence was released for the six-row barley (Hordeum vulgare) cultivar Morex. Here, we developed a de novo RNA-Seq-based genotyping procedure for barley strains used in Japanese breeding programs. Using RNA samples from the seedling shoot, seedling root, and immature flower spike, we mapped next-generation sequencing reads onto the transcribed regions, which correspond to ∼590 Mb of the whole ∼4.8-Gbp reference genome sequence. Using 150 samples from 108 strains, we detected 181,567 SNPs and 45,135 indels located in the 28,939 transcribed regions distributed throughout the Morex genome. We evaluated the quality of this polymorphism detection approach by analyzing 387 RNA-Seq-derived SNPs using amplicon sequencing. More than 85% of the RNA-Seq SNPs were validated using the highly redundant reads from the amplicon sequencing, although half of the indels and multiple-allele loci showed different polymorphisms between the platforms. These results demonstrated that our RNA-Seq-based de novo polymorphism detection system generates genome-wide markers, even in the closely related barley genotypes used in breeding programs.

Keywords: barley, genotyping, RNA-Seq, Japanese barley breeding, amplicon sequencing

### INTRODUCTION

The release of the draft barley (Hordeum vulgare) genome (International Barley Genome Sequencing Consortium [IBSC], 2012) revealed the existence of a large number of sequence polymorphisms (∼15 million) between several major haplotypes of this crop, even within exonic regions (∼350,000). The identification of these candidate marker polymorphisms encouraged us to generate a whole-genome genotyping system for barley. The barley research community has

developed a number of genome marker-based systems, initially using sequences from expressed sequence tags (ESTs) generated by the international consortium using regional donor cultivars of barley. The first comprehensive polymorphism detection system was Affymetrix GeneChip Barley1 (Close et al., 2004; Luo et al., 2007; Chen et al., 2010; Moscou et al., 2011), which uses hybridization probe sequences chosen to avoid the polymorphic regions of transcripts, and enables the simultaneous detection of gene expression and polymorphisms. Transcript sequence polymorphisms were used to develop the Illumina GoldenGate Assay SNP detection system, which included 2,943 mapped SNPs (Close et al., 2009), and other high-density marker systems were also created for the Illumina iSelect platform (Bayer et al., 2017). These prefixed marker systems contributed to the identification of genome-wide consensus marker polymorphisms by the barley research community, and also promoted the sequencing of the barley genome by facilitating the genetic mapping of BAC clones onto the genome (International Barley Genome Sequencing Consortium [IBSC], 2012). However, using sequence polymorphisms derived from EST donors limited the application of these marker systems, particularly in terms of marker detection using alien sources of materials.

Marker-assisted selection has become an important technique in crop breeding. Marker systems have been successfully applied to the selection of traits in a population generated from crosses between distantly related parents; however, relatively few markers are available for distinguishing between closely related strains, especially between the highly advanced parents used in breeding (Sato et al., 2011). The poor detection of markers in these populations is mainly due to the ascertainment bias in the source of the polymorphisms (Moragues et al., 2010).

The least biased method for detecting polymorphisms is to sequence haplotypes and compare their sequences. Nextgeneration sequencing (NGS) platforms have been used to resequence the haplotypes of many families (Wang et al., 2018; Zhao et al., 2018); however, sequencing the entire genome of barley is more difficult to assemble and analyze the sequences due to its large and repetitive genome. Even without reference genome sequences, NGS can provide sequence-based genomewide genotyping data sets. For the partial sequencing of a genome, restriction site-associated DNA sequencing (RAD-Seq) and genotyping-by-sequencing (GBS) technologies utilize restriction enzymes to identify high-density polymorphisms in the sequences around the digested regions (Poland et al., 2012; Kobayashi et al., 2016).

RNA-Seq was initially developed to analyze the expression levels of genes, but is also used for the detection of SNPs in the transcribed regions of the barley genome (Haseneyer et al., 2011; Takahagi et al., 2016). RNA-seq is a potential strategy to genotype species with a large genome size, where direct resequencing is too expensive. Previous work in wild wheat (Nishijima et al., 2016) and human (Piskol et al., 2013) demonstrated the utility of RNA-seq as a robust method to identify polymorphisms in large genome size samples. RNA sequences are only derived from exon sequences; therefore, they can be used to generate markers specific to genic regions, which are more likely to cause a phenotypic change that can be exploited or avoided in crop breeding. The total number of expressed genes is estimated to be ∼30,000, with an average transcript size of ∼1.5 Kb, providing a rough estimate of a single coverage of approximately 45 Mb from the full-length cDNA sequencing projects in barley (Sato et al., 2009; Matsumoto et al., 2011). The cost and time involved in RNA-Seq are much less than those required for whole-genome sequencing, particularly when genotyping multiple haplotypes for the detection of polymorphisms. The number of reads generated using RNA-Seq depends on the expression of each gene in the sequenced organs or in response to the particular growth conditions; therefore, the quality of markers must be confirmed, particularly for genes with a lower gene expression and therefore lower sequence redundancy.

In this report, we developed an RNA-Seq-based genotyping pipeline focusing on the genic sequences of the reference genome. Using this method, we evaluate whether we can reduce the calculation time required for genotyping without reducing the quality and accuracy of the results. We also compare and agree the results of our RNA-Seq-based genotyping with those generated using an alternative platform, amplicon sequencing (AmpliSeq).

### MATERIALS AND METHODS

### Samples for RNA-Seq

Public Japanese barley breeding (H. vulgare) programs provided the major strains used in their programs for genotyping. These breeding programs focused on six-row hulled food barley, hullless food barley, two-row non-malting barley, and two-row malting barley strains. We constructed one library of RNA-Seq for 68 accessions, two libraries for 38 accessions and three libraries for two accessions. A total of 150 RNA-Seq libraries were used in this study (**Supplementary Table 1**).

### RNA Extraction, Library Preparation, and Sequencing

The methods for growing the plants, RNA isolation, library preparation, and RNA sequencing were described by Sato et al. (2016). In brief, the seedling shoot and root tissues were sampled from plants with 5-cm shoots. RNA was also isolated from the immature spike within the leaf sheath of 39 strains, 5 days before heading in plants grown in the glasshouse. The RNA-Seq library was sequenced using the MiSeq Reagent Kit V3 (2 × 300 bp cycles) on a MiSeq NGS system according to the MiSeq System User Guide (Illumina, CA, United States), and fastq files with a read length of 300 bp were obtained from both ends of the fragments. The data were registered in the DDBJ BioProject (Accession: PRJDB6775).

### Genotyping Using RNA-Seq Data

The pipeline for genotyping using RNA-Seq is shown in **Figure 1**. The reference genome sequence of barley cultivar Morex and the annotation data were obtained from the Plant Genomics and Phenomics Research Data Repository<sup>1</sup> (Mascher et al., 2017).

<sup>1</sup>https://doi.org/10.5447/IPK/2016/34

The raw RNA-Seq data were processed to remove the adapter sequences and low-quality bases using trimmomatic-0.30, with the option "ILLUMINACLIP:adapter.fa:2:30:10 LEADING:15 TRAILING:15 SLIDINGWINDOW:4:15 MINLEN:32" (Bolger et al., 2014). Owing to the difficulty of the indexing large genome sequences onto chromosomes, every chromosome was split into two sections. The trimmed paired reads were then mapped onto the genome using hisat2-2.0.5 with the option "–min-intronlen 20 –max-intronlen 10000 –downstreamtranscriptome-assembly –rna-strandness RF" (Kim et al., 2015). The resulting mapping of each library was processed using samtools-1.4 (Li, 2011), sambamba (Tarasov et al., 2015) and picard<sup>2</sup> . Gene models based on known high-confidence (HC) genes (Mascher et al., 2017) were determined from all RNA-Seq samples using stringtie-1.3.3 (Pertea et al., 2015), after which the transcribed regions, including the exons, introns, and 3-Kbp upstream/downstream regions, were extracted from the reference genome sequence (Mascher et al., 2017) and referred to as "transcribed regions." In addition to mapping the sequence data to the reference genome, the data were also mapped to the transcribed regions using hisat2-2.0.5. Each sample was genotyped using GenomeAnalysisTK-3.2-2 with the option "-T HaplotypeCaller –emitRefConfidence GVCF –variant\_index\_type LINEAR –variant\_index\_parameter 128000 –filter\_reads\_with\_N\_cigar," and a gvcf file was constructed. Finally, all genotyping results were merged into a single file using GenomeAnalysisTK-3.2-2 (McKenna et al., 2010). The results were filtered by > 1 read depth and no neighbor polymorphisms around 60 bp using Perl scripts (**Supplementary Table 2**). When there were two or more RNA-Seq libraries for an accession, we used the seedling shoot and root library which was common to all accessions.

### Library Preparation and Sequencing for AmpliSeq

Of the 108 accessions, 38 strains were randomly selected for AmpliSeq. The genomic DNA of each accession was extracted from ∼100 mg of young leaf tissue using a DNeasy Plant Mini Kit (Qiagen, Hilden, Germany), and was quantified using a Qubit dsDNA High Sensitivity Assay Kit (Thermo Fisher Scientific, Waltham, MA, United States). The library was constructed using the Ion AmpliSeq Library Kit 2.0 (Thermo Fisher Scientific), according to the manufacturer's protocol. Using a multiplex PCR, 10 ng of each genomic DNA sample was amplified with a custom amplicon panel. Each sample was amplified in a 10-µL solution containing 2 µL 5× Ion AmpliSeq HiFi Master Mix, 5 µL 2× AmpliSeq Custom Primer Pool, 10 ng DNA, and nuclease-free water. The reaction mix was heated for 2 min at 99◦C to activate the enzyme, followed by 18 two-step cycles at 99◦C for 15 s and at 60◦C for 4 min, and ending with a holding period at 10◦C. The amplified samples were digested with 1 µL FuPa enzyme at 55◦C for 10 min, after which the enzyme was inactivated with a treatment at 60◦C for 20 min. To enable multiple libraries to be loaded on a single chip, 1 µL of a unique diluted mix, including IonCode Barcode and Ion P1 Adapters, was ligated to the end of the digested amplicons using 1 µL DNA ligase at 22◦C for 30 min, after which the ligase was inactivated by a 10-min treatment at 72◦C. The resulting unamplified adapter-ligated libraries were purified using 22.5 µL of Agencourt AMPure XP Reagent (Beckman Coulter, Brea, CA, United States), after which 75 µL freshly prepared 70% ethanol was added to each library. After purification, the libraries were further amplified to enrich the material for accurate quantification using 25 µL Platinum PCR SuperMix High Fidelity and 1 µL Library Amplification Equalizer Primer Mix (Ion AmpliSeq Library Kit 2.0; Thermo Fisher Scientific) at 98◦C for 2 min, followed by five two-step

<sup>2</sup>https://broadinstitute.github.io/picard/

cycles at 98◦C for 15 s and 60◦C for 1 min. The amplified libraries were then equalized to 100 pM using an Ion Library Equalizer Kit (Thermo Fisher Scientific), and subsequently sequenced on an Ion S5 system using an Ion 540 Chip (Thermo Fisher Scientific), following the manufacturer's instructions.

### Data Analysis Using Ion Torrent Suite Software

The Ion S5 sequence data was mapped to the transcribed regions using the Ion Torrent Suite version 5.8.0 software. The software was optimized for the Ion Torrent raw data analysis: alignment (Torrent Mapping Alignment Program (TMAP) version 5.8.17), coverage analysis version 5.8.0.8, and variant calling using the Torrent Variant Caller (TVC) plug-in version 5.8.0.8. The variant calling was performed using the default germline parameters.

### RESULTS

### Transcribed Region Sequences Showed Good Performance for RNA-Seq Mapping

We obtained 2.7–10.5 million paired reads of RNA-Seq data from each sample (**Supplementary Table 1**). On average, more than 5 million paired-end reads were used for genotyping, after being trimmed to remove low-quality and adapter sequences. The maximum trimming rate was 5.9% among the samples. The RNA-Seq reads were mapped onto the reference genome sequence of the barley cultivar Morex (Mascher et al., 2017), with a mapping ratio of 79.4 to 93.4% (**Supplementary Table 1**). After combining the mapping results with known annotated genes (Mascher et al., 2017), the numbers of predicted transcripts in each sample ranged from 41,028 to 78,200. A Spearman's rank correlation coefficient between the read numbers and the numbers of predicted transcripts among samples was 0.597. The plot of the read numbers and the numbers of predicted transcripts among samples indicated that the numbers of predicted transcripts were saturated at higher read number samples (**Supplementary Figure 1**). Among these transcripts, 106,912 loci were identified, including 39,270 known HC loci and 67,642 tentative loci determined using RNA-Seq data from this study. This result suggested that the transcribed regions of Morex were not fully covered by the sequences of the reported HC loci. We also found that the RNA-Seq data of this study did not map on 9,034 HC loci. We tried to define the transcribed regions of our RNA-Seq data on the reference genome sequence; however, sequences obtained using RNA-Seq often lack the start/end positions of the transcripts. We therefore used a set of alternative target sequences named transcribed regions, which included the transcripts, introns, and 3-Kbp upstream/downstream sequences. A number of loci were then concatenated, and a total of 590,551,456 bp of transcribed regions in 45,978 genomic loci were ultimately extracted. These sequences covered ∼12% of the Morex reference genome, 2.64 times more than those of the HC loci (223,654,512 bp) (Mascher et al., 2017).

We mapped the RNA-Seq data onto the transcribed regions, and the resulting mapping ratios differed from those on the reference genome (−22.3 to 1.46%) (**Supplementary Table 1**). We also found that six samples showed reduction of more than a 5% ratio of "properly mapped reads (without multiple hits)" (referred by hisat2 statistics) on the transcribed regions than the reference genome (**Supplementary Table 1**). In contrast, 120 of the 150 samples had a better mapping ratio for the transcribed regions than the reference genome (0.07 to 3.40%). These differences were mainly caused by reads with multiple mapping positions.

We compared the calculation times required for mapping and genotyping using the procedures for both the entire reference genome and the transcribed regions. Of the 108 samples, we randomly selected ten samples and calculated the average times required for the hisat2 and gatk analyses. The average time taken when using the transcribed regions was reduced by almost half using hisat2 and by two thirds using gatk in comparison with the times required when using the reference genome (**Figure 2**). In conclusion, use of only transcribed regions for the genotyping by RNA-Seq was effective in barley by the reduced cost and time compared to the use of reference genome sequence.

### Genome-Wide Polymorphism Detection Among 108 Japanese Barley Strains From RNA-Seq

Using the RNA-Seq mapping results, we detected 2,214,448 polymorphisms on 42,616 of the 45,978 transcribed regions

Average calculation times of ten randomly selected samples by (A) hisat2 and (B) gatk software's were presented.

TABLE 1 | Polymorphisms detected between RNA-Seq data from 108 Japanese barley strains and the reference genome sequence of cv. Morex.


(92.7%) in the reference genome (**Table 1**). These polymorphisms were categorized into 1,802,336 SNPs, 354,903 indels, and 57,209 loci with multiple alleles. Of the detected polymorphisms, 493,657 sites were homozygous between the Japanese barley strains and Morex; however, 56.2% of the homozygous polymorphisms were only obtained in a single strain. Of these, 57,209 were loci with two or more alleles. We considered that the polymorphisms with only one heterozygous strain might not be suitable for genotyping, and therefore discarded the polymorphisms only seen in a single strain, heterozygous calls with a single read, and different calls from multiple strains, leaving a total of 1,102,109 SNPs and 200,945 indels remaining. Finally, we extracted 181,567 SNPs and 45,135 indels without sequence polymorphisms and their neighboring 60 bp on both sides. These polymorphisms were located on 28,939 transcribed regions and distributed across the entire reference genome of Morex (**Figure 3**). The polymorphisms exhibiting differences between the Japanese breeding strains in the pairwise comparison included 44 to 24,026 SNPs and 92 to 2,679 indels (**Supplementary Table 3**).

We compared the polymorphisms among strains of tworow or six-row barley. In each row type, we filtered out the polymorphisms under the thresholds of both <0.1 minor allele frequency and <0.5 missing genotypes, resulting in a total of 8,475 SNPs and 597 indels remaining. When these polymorphisms were arranged on the chromosomes of Morex (**Supplementary Table 4**), it was revealed that large regions did not contain any polymorphisms. For example, 981 regions showed more than 1 Mbp without polymorphisms, with a maximum region of 97,779,576 bp to 400,251,595 (302,472,019 bp in length) bp on the sequence of chromosome 4H. These regions might be derived from either the conserved regions within Japanese two-row and six-row barley, or non-transcribed regions.

### Evaluation of the Polymorphisms Detected Using the Two Methods

As described above, the quality of our mapping and genotyping procedures was initially estimated based on the read depth. We further evaluated the quality of the detected polymorphisms using two additional methods. First, we evaluated the sequence polymorphisms derived from multiple RNA samples of a single strain. Of the 108 strains, 38 had two RNA-Seq libraries and reads. If the polymorphisms differed between the libraries, we considered the polymorphisms to be unreliable. We counted the number of agreed (identical) and disagreed (different) polymorphisms between the multiple libraries and calculated their agreement rate (agreed/total polymorphisms), comparing a total of 722,380 to 955,498 polymorphisms for each of the 38 strains. Of these, we omitted around 60% of the polymorphisms because they were detected in only one library. The agreement rates were 84.9 to 95.1% (average 91.1%).

Second, we compared the genotyping data generated using RNA-Seq and AmpliSeq. Although the agreement rate between multiple libraries from a single strain was more than 90% when comparing the RNA-Seq data, systematic genotyping errors could be present in the RNA-Seq polymorphism detection pipeline. To estimate the accurate nucleotide sequence of the polymorphic position, we used AmpliSeq to conduct a highly redundant targeted resequencing of a limited number of polymorphisms derived from the RNA-Seq analysis. Based on 274 randomly selected SNPs and 113 randomly selected indels from the RNA-Seq analysis, we designed 384 primer sets for AmpliSeq. Of these, three primer sets contained multiple (two) polymorphisms. Among the 108 strains, 38 were randomly selected for resequencing. Using two runs of sequencing, a total of 58,693,508 reads were generated and assigned to their respective strains using barcodes. The read number for each strain ranged from 42,702 to 3,942,476 (**Figure 4A**). The average read depths at a target position were 35 to 366,750, and 373 positions showed more than a × 100 read depth on average (**Figure 4B**); the 11 positions with less than a × 100 read depth were omitted from the subsequent analysis. The calls at the target positions were compared between the results of the AmpliSeq and RNA-Seq (**Table 2**). The agreement rates among the strains ranged from 58.2 to 94.6%, and 34 strains showed more than a 90% agreement. The SNPs (93.1%) showed higher agreement rates than the indels (65.1%). Among the above-mentioned 34 strains showing a high level of agreement, the SNP-specific agreement rate was more than 95%. We identified different indel polymorphisms in the RNA-Seq analysis, suggesting the presence of multiple allelic polymorphisms. Several of the SNPs detected using RNA-Seq also contained indels. These results show that AmpliSeq is suitable for the detection of a wider variety of polymorphisms than RNA-Seq, and the number of reads used for AmpliSeq does not affect the overall accuracy of genotyping.

### DISCUSSION

### Availability of DNA Markers in Biparental Populations

In breeding programs, DNA markers are used to select polymorphisms associated with target traits, including a particular mutation of the gene or a genotype from a particular individual. A genome-wide distribution of markers and the marker density around a target gene are very important for these purposes. The aim of the present study was to estimate whether it is possible to achieve these marker conditions even among related strains, such as those used in Japanese barley breeding programs.

As Sato et al. (2011) reported, the availability of polymorphisms between closely related strains was limited in the prefixed SNP analysis of the Golden Gate assay, with just 386 of the 1488 SNPs showing a polymorphism between the cultivars Russia 6 and Mikamo Golden. In the RNA-Seq analysis performed here, we identified 5,102 polymorphisms between these strains (**Supplementary Table 3**), which were distributed throughout the genome (**Supplementary Table 5**). The level of polymorphism between Russia 6 and Mikamo Golden was lower than the average polymorphism between the strains investigated in the current study (range 156–26,075, average 11,140) (**Supplementary Table 3**); however, the availability of DNA markers was still sufficient for the selection of traits in breeding programs.



The range of pairwise polymorphisms identified using the RNA-Seq analysis indicates the efficiency of DNA marker generation, even between the related strains used in breeding programs; however, the relative number of polymorphisms was indeed lower within strains of same row type than between strains of the different row types. Polymorphisms are not likely to be abundant between identical haplotype regions of related strains. Although the positions of the transcribed regions were well distributed across the Morex genome sequence, we also identified gene-poor regions on the genome (e.g., on chromosome 4H). The low gene density around the centromeres meant that we could not assign transcripts to these regions; therefore, it is likely that our procedure for detecting polymorphisms using RNA-Seq could not generate markers for these gene-poor regions.

### Efficiency of the RNA-Seq Pipeline for the Generation of DNA Markers

Due to the limitations of the time required for calculations and the cost of sequencing multiple samples, we restricted the source of sequences to the transcripts generated in the RNA-Seq analysis. To improve the genotyping process, we indexed the reference genome using sambamba (Tarasov et al., 2015), which can index bam files in less time than samtools (Li, 2011). To save

time when analyzing multiple samples, we used the transcribed regions from the reference genome sequence (Mascher et al., 2017). Our results showed that using only known HC loci did not fully cover the transcribed regions in our RNA-Seq sequences. The use of the transcripts and their ∼3-Kbp flanking regions reduced the size of the target sequences from 4.80 Gbp (whole genome) to just 0.59 Gbp, which had a major impact on the calculation time required for the mapping and genotyping processes. Our procedure using transcribed regions rather than the entire genome sequence halved the calculation time required for the mapping process and reduced the time required for genotyping by two thirds. Our comparison of the efficiencies of mapping the RNA-Seq data onto either the reference genome sequence or the transcribed regions did not change much in most samples in the meaning of mapping ratio.

### Quality and Application of RNA-Seq-Based DNA Markers in Breeding

We initially detected more than two million polymorphisms between Morex and the Japanese barley strains, which were distributed across the reference genome sequence. After the selection of polymorphisms with the thresholds of two or more reads and no neighboring polymorphisms around 60 bp, 226,702 polymorphisms were identified using 108 barley accessions. When we compared the genotypes of Morex and the Japanese barley strains, the number of polymorphisms in each pair were found to be relatively stable (ranging from 11,914 to 33,457); however, the RNA-Seq genotyping data did not include a large number of known polymorphisms. This was mainly due to the relatively low coverage of sequence reads (2.6–10.5 million read pairs) in this study, which was inevitable for an RNA-Seq analysis because of the low availability of reads from less highly expressed genes. As shown in **Supplementary Figure 2**, the plot of the read numbers and the numbers of SNPs among samples indicated that the numbers of SNPs were maintained larger at higher read number samples. The moderate level (ca. 100 markers per chromosome) of well distributed markers are ideal for trait mapping such as QTL (quantitative trait loci) analysis. More markers are needed for fine-mapping to candidate gene resolution and thus increasing the read depth would be advisable for the genome-wide genotyping. On the other hand, for the detection of core polymorphisms in a set of germplasms, such as strains used in breeding programs, it may be useful to focus more on the number of strains used than in the sequencing redundancy of a single strain, since common polymorphisms are likely present among the strains.

Several marker systems are currently available in barley. Illumina iSelect 50K array (Bayer et al., 2017) has SNPs with reference genome position but the polymorphisms are limited to the genotypes used to design the platform. GBS is a de novo detection of polymorphisms in genic and non-genic region with reference genome positions (Poland et al., 2012). Exome capture detects genomic sequence of the genic region (Mascher et al., 2013). Skim sequencing-based genotyping involves resequencing of multiple individuals followed by alignment of the reads to the reference sequence to genotype SNPs (Golicz et al., 2015). AmpliSeq is one of the best methods for the detection of targeted polymorphisms to date (Ogiso-Tanaka et al., 2019), and we estimated that there was an agreement rate of more than 90% in the core polymorphisms detected using RNA-Seq and AmpliSeq. As shown in **Table 2**, some of the accessions did not show strong agreement in genotyping results. We suspect that sources of RNA-Seq and Ampliseq were from different seed samples and their genotypes could be different. We also aggregated the accuracy by 387 Ampliseq target marker and found that 41 markers had missing data. These markers could be removed from the application. Of the 387 markers, 284 markers (82.1%) matched completely and other 34 markers showed less than five mismatches between RNA-Seq and AmpliSeq. Unlike a SNP array, AmpliSeq can detect not only a target SNP but also other flanking SNPs and indels within a target region. While the loci with multiple alleles represented 3,055 (1.3%) of the 226,702 total polymorphisms in RNA-Seq, relatively more of these sites were detected using AmpliSeq (30 out of 812; 3.7%). This difference might be caused by our avoidance of multiple-allelic sites in RNA-Seq in an attempt to retain reliable polymorphisms. AmpliSeq therefore identified more indel polymorphisms, which are generally less useful genomic markers than those based on SNPs, such as KASP, TaqMan and Fluidigm (Thomson et al., 2017). AmpliSeq requires information about the target polymorphisms before the analysis, and we therefore conclude that a possible DNA marker strategy for use in breeding programs would be to combine the detection of polymorphisms using RNA-Seq analysis and a subsequent marker detection using AmpliSeq.

### AUTHOR CONTRIBUTIONS

KS and TT designed the experiments and wrote the manuscript. TY prepared the seed samples. KS performed the RNA-Seq. GI and EO-T performed the AmpliSeq.

### FUNDING

This work was partly supported by the scientific technique research promotion program for agriculture, forestry, fisheries, and the food industry (25013A to KS) and JSPS KAKENHI Grant No. 19H00943 to KS.

### ACKNOWLEDGMENTS

We would like to thank the barley breeding stations (at NARO, prefectural and brewery programs in Japan) for providing the breeding materials. The barley seeds were provided through the National Bioresource Project of Barley, MEXT, Japan.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2019.00577/ full#supplementary-material

### REFERENCES

fpls-10-00577 May 9, 2019 Time: 14:46 # 9


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Tanaka, Ishikawa, Ogiso-Tanaka, Yanagisawa and Sato. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# High Resolution Mapping of RphMBR<sup>1012</sup> Conferring Resistance to Puccinia hordei in Barley (Hordeum vulgare L.)

Leila Fazlikhani1,2, Jens Keilwagen<sup>3</sup> , Doris Kopahnke<sup>1</sup> , Holger Deising<sup>2</sup> , Frank Ordon<sup>1</sup> and Dragan Perovic<sup>1</sup> \*

1 Institute for Resistance Research and Stress Tolerance, Federal Research Centre for Cultivated Plants, Julius Kühn-Institute (JKI), Quedlinburg, Germany, <sup>2</sup> Department of Phytopathology and Plant Protection, Institute of Agricultural and Nutrition Sciences, Faculty of Natural Sciences III, Martin Luther University of Halle-Wittenberg, Halle, Germany, <sup>3</sup> Institute for Biosafety in Plant Biotechnology, Federal Research Centre for Cultivated Plants, Julius Kühn-Institute (JKI), Quedlinburg, Germany

#### Edited by:

Takaki Yamauchi, PRESTO, Japan Science and Technology Agency, Japan

#### Reviewed by:

Kenji Yano, RIKEN Center for Advanced Intelligence Project (AIP), Japan Urmil Bansal, University of Sydney, Australia

> \*Correspondence: Dragan Perovic

dragan.perovic@julius-kuehn.de

#### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 30 January 2019 Accepted: 29 April 2019 Published: 22 May 2019

#### Citation:

Fazlikhani L, Keilwagen J, Kopahnke D, Deising H, Ordon F and Perovic D (2019) High Resolution Mapping of RphMBR<sup>1012</sup> Conferring Resistance to Puccinia hordei in Barley (Hordeum vulgare L.). Front. Plant Sci. 10:640. doi: 10.3389/fpls.2019.00640 Isolation of disease resistance genes in barley was hampered by the large genome size, but has become easy due to the availability of the reference genome sequence. During the last years, many genomic resources, e.g., the Illumina 9K iSelect, the 50K Infinium arrays, the Barley Genome Zipper, POPSEQ, and genotyping by sequencing (GBS), were developed that enable enhanced gene isolation in combination with the barley genome sequence. In the present study, we developed a fine map of the barley leaf rust resistance gene RphMBR1012. 537 segmental homozygous recombinant inbred lines (RILs) derived from 4775 F2-plants were used to construct a high-resolution mapping population (HRMP). The Barley Genome Zipper, the 9K iSelect chip, the 50K Infinium chip and GBS were used to develop 56 molecular markers located in the target interval of 8 cM. This interval was narrowed down to about 0.07 cM corresponding to 0.44 Mb of the barley reference genome. Eleven low-confidence and 18 high-confidence genes were identified in this interval. Five of these are putative disease resistance genes and were subjected to allele-specific sequencing. In addition, comparison of the genetic map and the reference genome revealed an inversion of 1.34 Mb located distally to the resistance locus. In conclusion, the barley reference sequence and the respective gene annotation delivered detailed information about the physical size of the target interval, the genes located in the target interval and facilitated the efficient development of molecular markers for marker-assisted selection for RphMBR1012.

Keywords: barley, leaf rust resistance gene RphMBR1012, positional isolation, GBS, Infinium 50K

## INTRODUCTION

Leaf rust of barley is a serious disease caused by the biotrophic fungus Puccinia hordei Otth., which, under favorable conditions, may cause yield losses of up to 62% (Park et al., 2015), while in general loses are about 15–25% (Whelan et al., 1997). Symptoms of leaf rust vary from small chlorotic flecks to large orange-brown pustules of up to 0.5 mm in size, often surrounded by green islands (Clifford, 1985). Although several resistance genes in barley have been identified,

the major challenge in control of barley leaf rust is the breakdown of resistance caused by mutations in effector (avirulence) genes of the pathogen, leading to occurrence of new virulent races on previously resistant plant cultivars in a short period of time (Park, 2003). Therefore, to combat leaf rust epidemics caused by newly occurring/generated virulent races and to achieve a sustainable disease control, the employment of new resistance genes using functional molecular markers in breeding schemes as well as the isolation of known ones in order to get detailed information on the structure and function is of prime importance. Furthermore, isolation of known resistance genes is a prerequisite that allow an efficient allele mining of genetic resources (Li et al., 2016) as well as allele editing, e.g., by CRISPR/Cas9 (Wang et al., 2014).

Since the first genetic study on leaf rust resistance (Waterhouse, 1927), 25 Rph (Resistance to P. hordei) genes have been mapped in barley (Kavanagh et al., 2017). Among them, two genes, namely Rph20 and Rph23, mediate an adult plant resistance (APR) (Hickey et al., 2011; Singh et al., 2015), while the remaining 23 (Rph1 to Rph19, Rph21, Rph22, Rph24, and Rph25) establish seedling resistance (Kavanagh et al., 2017). Rph5 and Rph6 on chromosome 3H (Zhong et al., 2003), Rph9 and Rph12 on chromosome 5H (Borovkova et al., 1998) and Rph15 and Rph16 on chromosome 2H have been described as alleles of the same gene (Weerasena et al., 2004). Only Rph7, Rph15, and Rph16 are still effective in Europe (Niks et al., 2000; Perovic et al., 2004) and the number of effective Rph genes available to breeders is decreasing rapidly (Kavanagh et al., 2017). Among all known Rph genes, only Rph1 has been isolated recently, using the newly developed cloning approach called Mutant Chromosome Sequencing (MutChromSeq) (Steuernagel et al., 2016) in combination with genetic mapping (Dracatos et al., 2018).

Molecular markers have been widely used in barley breeding for mapping of genes, marker-assisted selection, as well as in positional isolation of genes (Stein and Graner, 2005; Perovic et al., 2018). The most abundant molecular markers are single nucleotide polymorphism (SNP). Employing next generation sequencing (Ganal et al., 2018), SNPs are easily detectable in a high throughput manner and are therefore currently the markers of choice. The number of available SNP markers rapidly increased from about 180 EST markers to about 6,800 SNPs on the 9K Illumina iSelect chip up to 44,040 SNPs on the 50K Illumina Infinium array (Kota et al., 2003; Rostoks et al., 2005; Stein et al., 2007; Close et al., 2009; Muñoz-Amatriaín et al., 2011; Comadran et al., 2012; Bayer et al., 2017). The barley Genome Zipper (GZ) assembled 86% of the barley genes in a putative linear order (Mayer et al., 2011). Population sequencing methodology (POPSEQ) was developed as an integrated method to create a linear order of contigs using whole-genome-shotgun sequencing (WGS) data that resulted in the first ultra-high density map of the barley genome (Mayer et al., 2011; Mascher et al., 2013a). Assessment of the GZ and POPSEQ by Silvar et al. (2015) at seven loci mapped with higher genetic resolution revealed an accuracy of 97.8% with respect to the GZ and 99.3% to POPSEQ in comparison to consensus genetic maps. In addition to the above mentioned resources, advances in target capture/enrichment and next-generation sequencing, like GBS (Poland et al., 2012), exome capture (Mascher et al., 2013b), and barley reference genome sequence (Mascher et al., 2017) are available for marker development.

Although high resolution mapping allows precise zooming into targeted loci, the un-even distribution of crossovers along chromosomes (International Barley Genome Sequencing Consortium [IBSC], 2012) and the large variation in the genetic/physical ratio across the genome (Künzel et al., 2000) often hampers high-resolution genetic dissection. In barley, pericentromeric regions (pCENR) comprise at least 48% of the physical genome but harbor only 14–22% of the total barley gene content (Mascher et al., 2017). The other extreme are hotspots of high recombination rates in telomeric regions (Bhakta et al., 2015). In case of the locus of Ryd3, which is located in a centromeric region, the physical/genetic ratio has been estimated at 14–60 Mb/cM, while the genome-wide average is 4.4 Mb/cM (Lüpken et al., 2014). At the rym4/rym5 locus, the ratio of physical to genetic distances was in the range between 0.8 and 2.3 Mb cM and have increased to over 30 Mb cM, although the gene has been mapped on the telomeric region of chromosome 3H (Stein and Graner, 2005). This indicates that a large number of meiotic events is essential for a sufficient genetic resolution to detect recombination events in close vicinity to the targeted genes, and highlights the need for development of HRMPs.

Diverse collections of barley germplasm were evaluated for detecting new sources of leaf rust resistance (Perovic et al., 2003). In this respect, the RphMBR<sup>1012</sup> gene was mapped on the short arm of chromosome 1H (König et al., 2012), where only Rph4 has previously been localized (McDaniel and Hathcock, 1969; König et al., 2012). Prior to the recently cloned gene Rph1 by Dracatos et al. (2018), all efforts to isolate leaf rust resistance genes in barley were unsuccessful. An example of unsuccessful isolation is the case of Rph7 (Brunner et al., 2000; Scherrer et al., 2005). Hence, positional cloning is still one of the most efficient and reliable approaches to isolate a resistance gene in crop species with large genomes, such as wheat and barley (Krattinger et al., 2009). In barley, up to now five genes conferring resistance to fungal and viral pathogens have been isolated through map-based cloning, comprising mlo (Büschges et al., 1997; Simons et al., 1997), Mla6 (Halterman et al., 2001), Rpg1 (Brueggeman et al., 2002), rym4/rym5 (Pellio et al., 2005) and rym11 (Yang et al., 2014).

The aims of this study were to: (i) develop at HRMP for the RphMBR<sup>1012</sup> resistance gene, (ii) saturate the locus using all available state-of-the-art genomic resources i.e., GBS, 50K Infinium and the barley reference genome, (iii) anchor the genetic map to the barley reference sequence (iv) characterize the putative candidate rust resistance genes by allele specific re-sequencing and (v) test the developed markers for their diagnostic value.

### MATERIALS AND METHODS

### Plant Material and Construction of a High-Resolution Mapping Population

For high resolution mapping of RphMBR1012, a segregating population comprising of 4,775 F<sup>2</sup> plants was constructed



based on crosses between five DH-lines namely, the resistant (R) DH3/6 and DH3/127 and the susceptible (S) DH3/9, DH3/62 and DH3/74, which were derived from the original cross between the parental line MBR1012 (resistant) and Scarlett (susceptible). Based on these five DH-lines four crosses were conducted, namely DH3/74 (S) × DH3/6 (R), DH3/74 (S) × DH3/127 (R), DH3/6 (R) × DH3/9 (S) and DH3/62 (S) × DH3/127 (R) (**Table 1**). In order to identify recombinants, F<sup>2</sup> plants were analyzed using two flanking co-dominant SSRs, i.e., QBS94 (distal) and QBS113 (proximal) (Perovic et al., 2013). Respective markers were analyzed by capillary electrophoresis at the genetic analyzer ABI PRISM <sup>R</sup> 3100 (Applied Biosystems, Darmstadt, Germany). From identified heterozygous recombinant F<sup>2</sup> plants in target interval, 12 progeny plants, representing F<sup>3</sup> families were sown in 96 Quick pot plates. Genomic DNA of 10 days old plantlets was extracted in F<sup>2</sup> and F<sup>3</sup> according to Dorokhov and Klocke (1997). The quality of the extracted genomic DNA was checked by electrophoresis on 1% agarose gel and latter quantified by using the NanoDrop ND-100 spectrophotometer (PeQLab, Erlangen, Germany). By this approach, a HRMP of 537 recombinant inbred lines (RILs) was developed and subsequently used for marker saturation and resistance testing. Genomic DNA of the selected segmental homozygous RILs was extracted using the Miniprep method according to Stein et al. (2001). DNA of all samples was adjusted to a final concentration of 20 ng/µl. Furthermore, F<sup>3</sup> recombinant plants were self-fertilized and as F<sup>4</sup> segmental RILs used for phenotyping and genotyping with newly developed PCR based markers.

### Resistance Test

### Inoculum Preparation

Fresh urediniospores of leaf rust isolate I-80 were prepared by artificial inoculation at the two-leaf stage of Hordeum vulgare cultivar Grossklappige, which is highly susceptible to the majority of P. hordei isolates. Inoculated plants were covered with plastic for 24 h at 18◦C to ensure a moist environment. After 15 days, rust urediniospores were harvested and used for inoculation of RILs seedlings.

### Resistance Tests

Resistance tests were carried out in the greenhouse by inoculation of RILs along with the two H. vulgare parental lines, i.e., MBR1012 (resistant), Scarlett (susceptible) and susceptible (DH3/62) and resistant (DH3/127) DH-lines as well as the cv. Grossklappige as a control. Three plants per segmental RILs were sown in 96 Quick pot trays and 10 days old plantlets were inoculated with fresh I-80 urediniospores according to Ivandic et al. (1998). Briefly, 10 mg of fresh spores were used per 100 plants and mixed with white clay (Laborchemie Apolda, Germany), (1:3). The inoculated plants were kept at 18◦C and covered with plastic for 24 h, providing a moist environment for successful infection. All plants were scored at two time points, i.e., 10 and 13 days post-inoculation (dpi) according to Levine and Cherewick (1952). Segregation of resistant and susceptible plants was analyzed using the Chi-square (χ 2 ) tests for goodness-of-fit to the expected Mendelian segregation ratios.

### Marker Development

For marker saturation, initially 6 Simple Sequence Repeats (SSRs), 7 size polymorphism and 24 SNPs markers derived from the barley GZ and 9K iSelect high-density custom genotyping bead chip were used for random saturation of the large interval of about 8 cM (Perovic et al., personal communication), while the Illumina 50K Infinium array and Genotyping By Sequencing (GBS) were used in combination with the barley reference sequence (Mascher et al., 2017) for very precise marker saturation within an interval of 0.1 cM of the locus in this study (**Supplementary Table S1**).

### 50K iSelect Illumina SNP Array

The genomic DNA of parental lines, two DH-lines and two RILs from HRMP (carrying critical recombination within the resistance locus region) were used for the identification of polymorphic SNPs derived from the 50K Infinium array (TraitGenetics Gatersleben, Germany). The polymorphic SNPs located in the target interval were converted into Kompetitive Allele Specific PCR (KASP) assays by designing the two allele-specific forward primers, and one common reverse primer spanning the sequence of interest carrying the SNP position using Primer3 v. 0.4.0<sup>1</sup> (Koressaar and Remm, 2007; Untergasser et al., 2012). KASP markers were then used for genotyping of the HRMP.

### Genotyping-by-Sequencing (GBS)

The same lines as for the 50K array were used for GBS screening. A 20 ng/µl of genomic DNA of each line was used for GBS according to Wendler et al. (2014). Sequencing of selected lines was done on Illumina <sup>R</sup> MiSeqTM (Illumina, San Diego, United States). Sequencing data were analyzed using the Galaxy platform (Blankenberg et al., 2001; Giardine et al., 2005; Goecks et al., 2010) implemented at the JKI. After adapter and quality trimming (trim galore version 0.2.8.1; quality < 30, read length > 50), read mapping of the GBS data was executed using BWA version 0.7.15-r1140 (Li and Durbin, 2009) with standard settings to map the reads to the pseudomolecules of barley (Mascher et al., 2017). SNP calling was performed using mpileup version 1.2 (Li and Durbin, 2009), with genotype likelihood computation. Missing data was imputed with Beagle

<sup>1</sup>http://bioinfo.ut.ee/primer3-0.4.0/

v4.1 (Browning and Browning, 2016). Biallelic SNPs were detected and subsequently filtered for differences between the resistant and susceptible parental lines and a minimum coverage of five reads per SNP using SnpSift version 4.2 (Cingolani et al., 2012). KASP markers were designed for polymorphic SNPs positioned in the target region<sup>2</sup> .

### Marker Saturation

fpls-10-00640 May 21, 2019 Time: 18:26 # 4

The HRMP was genotyped using in total 56 molecular markers derived from the procedures described above. Molecular markers used may be divided in five types as follows: six SSRs based markers from the pyrosequencing assay (Silvar et al., 2011), three dominant present/absent markers, four size polymorphism markers [insertion/deletion polymorphisms (InDels)], 19 KASP markers and 24 Cleaved Amplified Polymorphic Sequences (CAPS) markers. Size polymorphisms markers and SSRs were amplified in a total volume of 10 µl, according to Perovic et al. (2013) and detected either using fluorescently labeled primers (M13) by capillary electrophoresis on the ABI Genetic Analyzer (ABI sequencer, ABI Perkin Elmer, Weiterstadt, Germany), or directly separated on a 1.5% agarose gel. For ABI analysis, 0.1 µl of M13 primer (10.0 pmol/µl) (50 -CACGACGTTGTAAAACGAC-3<sup>0</sup> ) labeled with fluorescent dye was added to the reaction mix. One microliter of diluted PCR product was added to 14 µl of HiDi-Rox mastermix (1.4 ml Hidi and 6 µl Rox) in a total volume of 15 µl. Results were analyzed using the software package GeneMapper v4.0 (Applied Biosystems, Darmstadt, Germany). For 43 sequences, detected SNPs were converted either to KASP markers (see footnote 2) or CAPS markers using NEB cutter v.2.0<sup>3</sup> . KASP reaction was performed in total volume of 5 µl containing 2.5 µl KASP mix (LGC Genomics GmbH, Germany), 0.08 µl forward primer, allele 1 (10.0 pmol/µl, labeled with FAM M13 tail), 0.08 µl forward primer allele 2 (10.0 pmol/µl, labeled with HEX M13 tail), 0.2 µl reverse common primer (10.0 pmol/µl), and 2.2 µl template DNA (20 ng/µl). For CAPS analysis, DNA amplicons were cleaved with the respective restriction endonuclease (**Table 2**) in a volume of 20 µl, containing 2 µl corresponding 10× buffer, 0.1 µl appropriate enzyme, 7.9 µl HPLC gradient grade water (Carl Roth, Karlsruhe, Germany) and 8/10 µl of the PCR product. Proper temperature was applied according to manufacturer's instructions for each restriction endonuclease and digestion was done for 3 h.

The following PCR conditions were used for all SSRs, size polymorphism and CAPS markers: denaturation at 94◦C for 5 min followed by 12 cycles at 94◦C for 30 s, annealing at 62◦C to 56◦C (–0.5◦C/cycle) for 30 s, extension 30 s at 72◦C, 94◦C for 30 s, 56◦C for 30 s, 72◦C 30 s, 35 cycles, final extension at 72◦C for 10 min.

The PCR amplification condition for KASP markers were: 10 min at 94◦C, followed by 10 cycles: 94◦C for 20 s, annealing at 61◦C to 55◦C (–0.6◦C/cycle) for 60 s, followed by 26 cycles: 94◦C for 20 s, 55◦C for 60 s, 30◦C 60 s. The real-time PCR machine was used to detect the fluorescence from HEX and

<sup>2</sup>http://www.lgcgroup.com/

<sup>3</sup>http://tools.neb.com/NEBcutter2

TABLE 2 | Molecular markers used for the construction of the high resolution map.


1 Initial flanking marker

FAM on plate reads. After thermal cycling was completed, the fluorescent signal was detected by reading the plate in the qPCR machine at 37◦C. At the end of the run the results were shown in the data analysis software under "Allelic Discrimination." The software automatically showed the clusters for the alleles for samples based on their position in the allelic discrimination plot (LGC, Guide to running KASP genotyping on the BIO-RAD CFX-series instruments').

### Linkage Analysis

fpls-10-00640 May 21, 2019 Time: 18:26 # 5

Linkage analysis was performed by dividing the number of the recombination events with the number of analyzed gametes, multiplied with 100. The recombination frequency was used for the genetic linkage map construction and visualized using MapChart (Voorrips, 2002) software package.

### Testing the Diagnostic Value of Co-segregating and Closely Linked Markers

Co-segregating markers in RphMBR<sup>1012</sup> locus were tested for their diagnostic value on a set of 63 genotypes comprising 25 selected barley genotypes/lines carrying Rph1 to Rph25, 23 parental lines and 15 Bowman introgression lines carrying Rph1 to Rph15 (**Table 3**). The diagnostic value of tested co-segregating markers (%) was calculated using the following equation:

Diagnostic value = Number of lines showing different allele of MBR1012 Total number of analyzed lines <sup>×</sup> <sup>100</sup>(%)

### Anchoring the Genetic Map to the Barley Reference Sequence

All 56 markers used for construction of the HRMP were anchored to the barley Reference genome sequence (Mascher et al., 2017). All sequences including forward and revers primers were blasted against the barley reference genome sequence<sup>4</sup> using BLASTN algorithm applying default parameters. Obtained physical positions of mapped markers were visualized using software MapChart (Voorrips, 2002).

### Use of the Barley Reference Sequence for the Identification of Candidate Genes

Marker positions in the barley reference sequence were used to determine the target interval of the resistance gene locus and to extract putative candidate genes<sup>5</sup> . After defining the genomic region of the resistance locus at the barley reference sequence, High-Confidence (HC) and Low-Confidence (LC) genes including Exon-intron boundaries were extracted from the available annotation (Mascher et al., 2017). The reconstruction of the gene intron-exon-structure was performed using the internet platform "Splign"<sup>6</sup> from NCBI, which allows alignment of mRNA to genomic sequence (Kapustin et al., 2008).

### Allele Specific Re-sequencing of Candidate Genes

The allele specific re-sequencing of candidate genes was conducted for 18 high and 11 low confidence genes positioned in the candidate interval. Online software Primer3 v. 0.4.0 (see footnote 1) (Koressaar and Remm, 2007; Untergasser et al., 2012) setting the parameters at 20–22 bp, temperature 58–62◦C and product size of 350 bp was used for primer design, which subsequently were then tested for their specificity for chromosome 1H using the barley blast server(see footnote 4) against the barley pseudomolecules according to Mascher et al. (2017). In the first round of low pass resequencing, a set of 36 primer pairs were designed covering all 29 high and low confidence genes. In the second round of the experiment, 25 primer pairs were designed in order to sequence the full length of five disease resistance genes. To sequence the entire gene, Morex contigs including the gene sequence of each disease resistance gene were identified using (see footnote 4) allowing to design primers at least 20 bases upstream of the start codon and 20 bases downstream of the stop codon. Moreover, the primers should overlap to ensure that there are no gaps between the fragments after sequence analysis. A fragment size of 400 to 1,200 bp was chosen because of the maximum sequencing length. Amplification was done on the parental genotypes MBR1012 and Scarlett, as well as on two DH-lines [DH3/62 (S), DH3/127 (R)]. Amplification reaction was prepared in a total volume of 20 µl containing 2 µl of 10× PCR buffer (Qiagen, Hilden, Germany), 2 µl of 25 mM MgCl2, 0.4 µl of 10 mM dNTPs (Fermentas, Schwerte, Germany), 0.5 µl of each forward and reverse primer (10.0 pmol/µl), 0.16 unit of fire DNA polymerase (5 U/µl), (Qiagen, Hilden, Germany), 12.44 µl HPLC gradient grade water (Carl Roth, Karlsruhe, Germany) and 2 µl of template DNA (20 ng/µl). Next, obtained PCR products of the same size were subjected for sequencing. PCR fragments were separated by agarose gel electrophoresis and analyzed using the imaging system Gel DoceTM XR and the Quantity One <sup>R</sup> 1-D analysis software (4.6.2) (Bio-Rad, Hercules, United States) and subsequently sequenced by the company Microsynth AG (Balgach, Switzerland) using the Sanger sequencing method (Sanger et al., 1977). Obtained sequences were edited and analyzed using Sequencher 5.1 software (Gene Codes, Ann Arbor, MI, United States) using default parameters.

Functional analysis of identified polymorphisms between parental lines (MBR1012 and Scarlett) was done using the multiple sequence alignment program, MAFFT by default parameters (Katoh and Standley, 2013).

### RESULTS

### Construction of the High-Resolution Mapping Population

Four crosses i.e., DH 3/74 (S) × DH3/6 (R), DH3/74 (S) × DH3/127 (R), DH3/6 (R) × DH3/9 (S) and DH3/62 (S) × DH3/127 (R), were used for the construction of the HRMP (**Table 1**). In total, of 5,237 F<sup>2</sup> plants, 4,775 survived, and from corresponding F<sup>3</sup> families 537 recombinant F<sup>4</sup> RILs were developed, resulting in an interval harboring the resistance locus of 0.07% recombination. Finally, a genetic resolution of 0.010% recombination was achieved.

Phenotypic analysis of resistance to RphMBR<sup>1012</sup> showed a segregation of 261 resistant and 276 susceptible RILs and revealed

<sup>4</sup>http://webblast.ipk-gatersleben.de/barley\_ibsc/

<sup>5</sup>https://plants.ensembl.org/Hordeum\_vulgare/Info/Index

<sup>6</sup>https://www.ncbi.nlm.nih.gov/sutils/splign/splign.cgi?textpage=online& level=form



the expected 1r:1s segregation ratio among these RILs. Chisquare test (χ 2 1:1 = 0.4189, df = 1, p < 0.05) for goodness of fit indicated that the resistance in MBR1012 is monogenically controlled (**Figure 1** and **Table 1**).

### Marker Saturation of the RphMBR<sup>1012</sup> Locus and Anchoring to the Barley Reference Sequence

A fine map of the RphMBR<sup>1012</sup> was constructed using the set of 537 segmental homozygous RILs (**Figure 2**). Marker saturation of the HRMP resulted in reducing the target interval to 0.1 cM. After screening parental lines using the 50K chip and GBS, 19 new polymorphisms were identified in the target region of 0.1 cM.

The 50K screen revealed in total, a set of 40,777 scoreable SNPs at the barley genome (**Figure 3**). Out of these, 14,616 SNPs showed homozygous polymorphisms between resistant and susceptible genotypes. Thirty-nine SNPs were located at the large interval of 8.0 cM on chromosome 1HS, and four SNPs were located at the closest target interval comprising 0.1 cM. These SNPs were converted into KASP markers and mapped on the whole HRMP population (**Supplementary Table S1**).

Genotyping by sequencing analysis yielded 48,226 SNPs distributed over all seven barley chromosome, of which 37,287 showed homozygous polymorphisms between resistant and susceptible lines (**Figure 3**). Out of these, 80 polymorphic markers were located in the larger interval, flanked by QBS94 and QBS113 (8.0 cM) and 15 SNPs were identified in the shortened interval of 0.1 cM. KASP markers were designed for all 15 SNPs and used for genotyping of the 537 RILs (**Supplementary Table S1**).

Mapping of all mentioned markers showed that the RphMBR<sup>1012</sup> locus is located in a region of 0.07 cM between tightly linked markers QBS127 (SNP) and QBS98 (size polymorphism) at 0.020% (distal) and 0.050% (proximal) recombination of the RphMBR<sup>1012</sup> locus. Thus, the target interval was shortened from 0.1% recombination to 0.07% recombination (**Figure 4**). A high-density genetic map revealed ten markers co-segregating (QBS128, QBS129, QBS130, QBS131, QBS132, GBS626, GBR534, GBS546, QBS116 and QBS117) within the RphMBR<sup>1012</sup> locus (**Figure 4**). Moreover, recombination distribution in the target interval was uneven varying from 0.58 to 0.60 Mb/cM proximally and distally, respectively to the resistance locus, to 7.26 Mb/cM at the RphMBR<sup>1012</sup> locus (**Figure 2**). Marker saturation also revealed a high number of recombination between markers QBS96 and QBS71 in the distal region of the interval, i.e., 177 recombination events and 112 recombinations between markers QBS112 and QBS113 located proximally (**Figure 5**). However, the analysis allowed narrowing the RphMBR<sup>1012</sup> locus to a region comprising a limited number of candidate genes.

BLAST searches against the barley reference sequence revealed that the mapped markers were in a nearly perfect co-linear order. However, 15 markers within 1.34 Mb in the distal part of chromosome 1HS showed a marker inversion (**Figure 4**). The BLAST searches also indicated only one hit for 20 markers (12 SNPs, 4 SSRs and 4 size polymorphism) on chromosome 1H and two or more hits for 17 markers (12 SNPs, 2 SSRs and 3 size polymorphism). The physical size of the large target interval of 8.0 cM between the flanking markers QBS94 and QBS113 encompassed 6.24 Mb. This region harbors 299 genes of which 183 are high confidence (HC) genes and 116 are low confidence genes. Based on the sequence annotation of HC

and LC genes, 23 genes were disease resistance proteins and three were annotated as powdery mildew resistance proteins (**Supplementary Table S2**). Likewise, physical size of the shortened interval carrying RphMBR<sup>1012</sup> flanked between QBS127 and QBS98 was estimated to 0.44 Mb (**Figure 4**). In this interval 11 low confidence and 18 high confidence (HC) genes were detected (**Supplementary Table S2**). Fifteen of these genes are functionally annotated and five of them are related to pathogen resistance, i.e., HORVU1Hr1G000830 (disease resistance protein), HORVU1Hr1G000840 (powdery mildew resistance protein PM3 variant), HORVU1Hr1G000860 (disease resistance protein), HORVU1Hr1G000900 (disease resistance protein) and HORVU1Hr1G000910 (disease resistance protein) (**Supplementary Table S2**). The markers QBS128 and QBS130 are exactly located at two disease resistance genes, namely HORVU1Hr1G000830 and HORVU1Hr1G000910.

Furthermore, the available barley annotation (Mascher et al., 2017) revealed a mosaic structure of exon and intron fragments only for two disease resistance genes, namely HORVU1Hr1G000830 and HORVU1Hr1G000860, while the three other disease resistance genes (HORVU1Hr1G000840, HORVU1Hr1G000900 and HORVU1Hr1G000910) only have one coding exon (**Figure 6**).

### Testing Diagnostic Value of Developed Markers

Diagnostic assessment of markers co-segregating markers with the RphMBR<sup>1012</sup> was conducted. However, out of ten tested markers only six showed clear a allele differentiation, whereas for four markers, i.e., QBS129, QBS130, QBS131 and QBS132 had to be excluded. The number of alleles detected varied from two alleles for markers QBS116, QBS117, QBS128, QBS130, GBS546, GBS626 and seven alleles for GBR534. For the two markers GBS546 and GBS626 most of the cultivars/lines showed the same allele as the susceptible parental line Scarlett with 80.32 and 83.60% accuracy, respectively. Marker QBS117 with 9.8% accuracy for RphMBR<sup>1012</sup> has no diagnostic value to trigger this gene. Other tested markers were also of limited value for markerassisted selection (**Table 3**).

### Allele Specific Re-sequencing of Candidate Genes

Allele specific re-sequencing for all 29 putative genes located on the pseudomolecule of chromosome 1H from 2,206,515 to 2,763,382 bp located in a narrowed interval comprising 0.44 Mb was conducted twice. In the first round of low pass resequencing, a set of 36 primer pairs were designed, 33 primer pairs amplified products in both parental lines, one was dominant by amplifying products in Scarlett and two were dominant for MBR1012 and did not produce any fragment on Scarlett. For two genes no specific primer on chromosome 1HS could be designed due to the high similarity of the sequences of these genes (e.g., gene HORVU1Hr1G000820.1: on chromosome 4H, 1863 bp of 1866 bp identical to chromosome 1H). Out of 36 primer pairs, 24 primer pairs were functional, while 12 primer pairs were not functional, since PCR products gave

multiple bands, smear or present/absent patterns. Finally, 24 PCR amplicons of the functional primer pairs were sequenced. Moreover, markers for which polymorphisms were based on size polymorphism of polymerase chain reaction (PCR) fragments between parental lines (HORVU1Hr1G000910.9\_s3958\_as4143 and HORVU1Hr1G001060.1\_s173\_as480) were directly mapped into the HRMP population. By editing the sequence data, sequence of 18 amplicons could be aligned in both parental lines while for six fragments no alignments were achieved due to the low quality of the sequence data or obtained heterozygous signals (**Supplementary Table S3**).

Next, for whole length amplification and re-sequencing of five disease resistance genes in the target interval, 25 new primer pairs were designed (**Supplementary Table S4**). Out of 25 designed primers, 23 amplified products in both parental lines. From this experiment, 12 PCR products were sequenced (**Supplementary Table S3**). Finally, for 31,204 bp of all 29 candidate genes 61 primer pairs were designed, yielding DNA sequence information for 17,107 bp in MBR1012 and 16,963 bp in Scarlett. Using this sequence data, 259 SNPs were identified for disease resistance genes from the target interval. Moreover, from gene HORVU1Hr1G000900.5 (Disease resistance protein) a large deletion (InDel) was identified in Scarlett ranging from 26 to 222 bp. Seven SNPs for HORVU1Hr1G000830.3, nine for HORVU1Hr1G000860.7 and 243 for HORVU1Hr1G000900.5 were identified (**Supplementary Table S3**). For two resistance genes i.e., HORVU1Hr1G000840.1 and HORVU1Hr1G000910.9 no SNP/InDel were identified. Functional annotation of defined

FIGURE 4 | High-density genetic and physical map of the RphMBR<sup>1012</sup> region on barley chromosome 1HS based on 56 molecular markers and 537 recombinant inbred lines derived from the cross MBR1012 × Scarlett.

SNPs between parental lines, MBR1012 and Scarlett, revealed synonymous mutations for 11 SNPs whereas for 17 SNPs amino acid substitutions were detected. For two SNPs the arginine amino acid changed to a stop codon (TGA) (**Table 4**). Multiple alignment also revealed polymorphisms between the parents and barley reference sequence (**Supplementary Table S5**).

## DISCUSSION

Leaf rust is an important fungal disease affecting barley production (Park, 2003). Fungicide application is an option to reduce yield losses but is not always efficient and cannot be considered as a sustainable disease management (Park et al., 2015).

Thus, growing of resistant cultivars is the most economical and environmental friendly way to reduce yield losses caused by leaf rust (Kolmer, 1996). However, disease resistance provided by major Rph genes is often overcome due to the emergence of new P. hordei pathotypes (Niks, 1982; Steffenson et al., 1993; Park, 2003) indicating the need for introducing new sources of resistance into barley breeding as well as the need for isolating known ones toward deciphering the structure and function offering the possibility of developing functional markers for breeding and create new alleles by e.g., CRISPR/Cas9 (Kumar et al., 2018).

In this study we have shown the efficient use of the barley reference sequence in physical mapping and especially in marker saturation. Previously, Perovic et al. (2003) demonstrated that the barley landrace MBR1012 is resistant to the barley leaf rust isolate I-80, which later was mapped using 14 SSRs and three SNPs markers on barley chromosome 1HS (König et al., 2012). A null allele of the SSR marker GBMS187 was identified as the closest linked marker at 0.8 cM proximal to the resistance gene. The allelic status of RphMBR<sup>1012</sup> and Rph4 (McDaniel and Hathcock, 1969), two genes mapped on the short arm on barley chromosome 1HS, is part of an ongoing experiment (Perovic et al., in preparation). The phenotypic evaluation conducted here revealed a hypersensitive reaction of the RphMBR<sup>1012</sup> resistance gene (**Figure 1**), while the genetic analysis demonstrates that by using genetically mapped markers in combination with the genome sequence information (Mascher et al., 2017) the physical position of this locus can be determined easily. An initial size of the locus of 6.25 Mb that was estimated based on the published map was further downsized by the use of new marker resources and by increasing the genetic resolution.

For many years, mapping of resistance genes relied on the use of various molecular markers i.e., restriction fragment length polymorphism (RFLP) (Graner et al., 1991; Kleinhofs et al., 1993), random amplified polymorphic DNAs (RAPDs) (Williams et al., 1990; Chalmers et al., 1993), amplified fragment length polymorphism (AFLPs) (Vos et al., 1995; Qi et al., 1998) and SSRs (Ramsay et al., 2000; Varshney et al., 2007). For instance, the powdery mildew resistance gene mlo was identified by a combined use of RFLP and AFLP markers which was the first gene, isolated by map-based cloning in barley (Büschges et al., 1997). AFLP, RAPD and RFLP-derived markers were also used to saturate the Mla region (Wei et al., 1999). However, using these marker systems, gene isolation was a laborious and time consuming effort. Advances in molecular marker technologies as well as the previous version of the barley genome sequence already facilitated an accelerated fine mapping of disease resistance genes (Lüpken et al., 2013, 2014; Yang et al., 2014). New Illumina SNP genotyping assays, namely 9K and 50K (Comadran et al., 2012; Bayer et al., 2017), together with GBS (Poland et al., 2012) opened a new way for a more efficient and faster marker saturation of target loci in barley. In our study, above mentioned marker resources were used for a first marker saturation of RphMBR1012. During the simultaneous construction of a fine map and an initial marker saturation a set of 37 GZ and 9K iSelect SNP markers were randomly selected and mapped to our target interval of 8.0 cM reducing the target interval to 0.1 cM flanked by QBS97 and QBS98. Subsequently, the newly developed high-density barley 50K Infinium SNP markers (Bayer et al., 2017) and GBS markers, which were selected using the reference sequence in the shortened candidate interval (0.1 cM), resulted in the identification of nineteen additional polymorphic SNPs. These markers were converted into KASP markers and the RphMBR<sup>1012</sup> locus was genetically further narrowed into an interval of 0.07 cM between the markers QBS127 and QBS98. In the target interval, ten markers i.e., QBS128, QBS129, QBS130, GBS626, GBR534, GBS546, QBS116, QBS117, QBS131 and QBS132 spanning 0.07 cM genetic distance between QBS127 (at 0.02 cM) and QB98 (at 0.05 cM) were cosegregating. Seven out of the ten co-segregating markers, namely


TABLE 4 | Functional annotation of SNPs between parental lines (MBR1012 and Scarlett) originated from candidate genes located within the 0.44 Mb of target interval.

∗ : Stop codon; E: Exonic; I: Intronic; U: Upstream; D: Downstream.

QBS116 (50K), QBS117 (50K), QBS128 (GBS), QBS130 (GBS), QBS131 (GBS), GBS546 and GBR534, were located in five genes in the target interval.

Fine mapping of resistance genes is a bottleneck in gene isolation due to the presence of many genes within target intervals, an uneven recombination frequency and a lack of molecular markers, (Stein and Graner, 2005). The fine map for the RphMBR<sup>1012</sup> region constructed in this study was based on a set of 56 molecular markers including four InDel, three present/absent, six SSRs, and 43 SNPs markers. Even though RphMBR<sup>1012</sup> is located in the telomeric region, it turned out that recombination events are not distributed continuously along this region. Although RphMBR<sup>1012</sup> is surrounded with two highly recombining regions at the telomere of chromosome 1HS, 0.58 and 0.6 Mb/cM, the locus is in very unfavorable region of 7.28 Mb/cM with a high number of co-segregating markers, again elucidating that the potential of map-based cloning still depends on the genomic context around the gene of interest. Uneven distribution of recombination frequencies along the genome (Künzel et al., 2000; Akhunov et al., 2003) and differences in local recombination rates, may cause regions even without any recombination over large physical distances which are not suited to map based cloning (Qi and Gill, 2001; Neu et al., 2002). Consequently, the efficiency of an effort of increasing the population size has always to be considered.

Genome-wide studies and multiple gene surveys recorded variation of the SNP frequencies in barley from one SNP per 240 bp, per 200 bp, and per 189 bp (Shavrukov, 2016). In contrast, one SNP per 7 bp in the leaf rust resistance Rph7 gene region evidently showed the usefulness of high-density SNP markers for the purpose of gene isolation in barley (Scherrer et al., 2005).

In addition to the resources used in our study the following genomic resources for marker saturation nowadays may be used: exome sequencing (Mascher et al., 2013b), RNA sequencing (RNAseq) (Wang et al., 2009) which is based on transcriptome profiling, resistance gene enrichment sequencing (RenSeq) (Andolfo et al., 2014) and WGS. These methods may serve to enhance the detection of polymorphism in the genome and to develop markers toward gene isolation in a short period of time. More recently, MutRenSeq that combines the complexity reduction of R gene targeted enrichment sequencing and computational analysis based on comparative genomics provides a tool for the rapid cloning of disease resistance (R) genes in plants (Steuernagel et al., 2016; Dracatos et al., 2018).

Anchoring of the markers against the barley reference sequence elucidated the physical size of 0.44 Mb for the interval harboring RphMBR1012. The order of all mapped markers were inconsistent with the order in the barley physical map (Stein et al., 2007). However, a large rearrangement of 15 markers within 1.34 Mb in the distal part of chromosome 1H was observed. This inversion is based on non-fixed orientation of the BAC-based sequence contig within the small scaffold having only one the anchor point (personal communication Martin Mascher).

Twenty-nine annotated genes were identified within the narrowed down interval between markers QBS127 and QBS98 comprising five disease associated resistance genes (R genes) which support the prior observation that many barley resistance genes are located distally in regions with high recombination frequency (International Barley Genome Sequencing Consortium [IBSC], 2012). It has been indicated, that more than 80% of all known R genes are of the NBS-LRR type (nucleotide-binding leucine rich repeat) (Shao et al., 2016). LRR domains have particular function in plant-pathogen recognition (Hong and Zhang, 2016). The annotation using Blastx against the non-redundant protein database of NCBI also indicates the presence of the NBS-LRR domain in all five disease resistance genes in the target interval. Disease resistance genes located in the target interval tend to cluster which is typical for NBS-LRR based resistance gene analogs (DeYoung and Innes, 2006). Since P. hordei is a biotrophic fungi and the fact that NBS-LRR resistance genes are only effective in conferring resistance to biotrophic or hemibiotrophic pathogens, but not against necrotrophic pathogens (Belkhadir et al., 2004) provides evidence that resistance is due to a gene carrying the NBS-LRR motif. Hence, full length re-sequencing of five disease resistance genes in parental lines was conducted. However, more than 80% similarity in the sequences of R genes considerably hampered sequencing, therefore in order to obtain a complete sequence of the disease resistance genes, new primer design will be conducted.

Marker validation of seven co-segregating markers in 51 already tested barley lines (König et al., 2012), as well as 12 other barley cultivars/lines, gave hint that new markers identified in this study are not all diagnostic for RphMBR1012. Based on our study, the markers GBS546 and GBS626 with 80.32 and 83.60% accuracies in prediction of RphMBR<sup>1012</sup> are the best diagnostic markers and facilitate faster and easier detection of RphMBR<sup>1012</sup> (and putative alleles) in barley breeding lines. Out of the selected markers QBS128 (HORVU1Hr1G000830/Disease resistance protein), QBS130 (HORVU1Hr1G000910/Disease resistance protein), QBS116, QBS117 and GBR534 (HORVU1Hr1G000940/copper ion binding), and marker GBS546 (HORVU1Hr1G000930/Low molecular weight glutenin subunit) were directly derived from putative candidate genes in the target interval but revealed a less diagnostic character. However, the diagnostic RphMBR<sup>1012</sup> markers identified in this study could be very useful not only for discriminating between resistant and susceptible cultivars but also for pyramiding of RphMBR<sup>1012</sup> with other resistance genes to aim a durable resistance in barley cultivars (Sharma Poudel et al., 2018).

### CONCLUSION

In summary, by using high-throughput genotyping and sequencing techniques together with the barley reference sequence we succeeded to downsize the RphMBR<sup>1012</sup> target interval to 0.44 Mb between markers QBS127 and QBS98 in comparison to 6.24 Mb in a previous study. This is an indispensable step toward isolation of this gene. Four strategies might be then considered in next step in order to define the loci underlying the resistance gene RphMBR1012; enhancing the map resolution via screening a new set of F<sup>2</sup> plants and using the new

SNPs and InDel defined from candidate genes at target interval to develop the new markers for further marker saturation, screening a non-gridded BAC library from donor line MBR1012, overexpression of five detected disease resistance genes in the target interval in a susceptible barley cultivar, e.g., Scarlett and knock out the genes in resistant lines using CRISPR/cas9. The co-segregating and closely linked markers detected in this study, may be useful as probes for BAC library screening and construction of the physical map in MBR1012.

### AUTHOR CONTRIBUTIONS

DP, DK, and FO conceived and designed the experiments, provided the experimental material and contributed to study design, subject recruitment and sample preparation. LF, DK, and DP performed the experiments. LF, JK, and DP analyzed the data. LF, JK, HD, FO, and DP interpreted the data. All authors wrote the manuscript, and read and approved the final manuscript.

### REFERENCES


### FUNDING

This research financially was supported by grants from the Federal Ministry of Education and Research, Bundesministerium für Bildung und Forschung (BMBF) and the Deutscher Akademischer Austauschdienst e. V. (DAAD) to LF.

### ACKNOWLEDGMENTS

The authors acknowledge the excellent technical assistance of Katy Niedung and Marlis Weilepp.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2019.00640/ full#supplementary-material


Feuerstein, U., Brown, A. H. D., and Burdon, J. J. (1990). Linkage of rust resistance genes from wild barley (Hordeum spontaneum) with isozyme markers. Plant Breed. 104, 318–324.

Franckowiak, J., Jin, Y., and Steffenson, B. (1997). Recommended allele symbols for leaf rust resistance genes in barley. Barley Genet. Newsl. 27, 36–44.





**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Fazlikhani, Keilwagen, Kopahnke, Deising, Ordon and Perovic. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Molecular Characterization of 87 Functional Genes in Wheat Diversity Panel and Their Association With Phenotypes Under Well-Watered and Water-Limited Conditions

Maria Khalid1,2, Fakiha Afzal<sup>1</sup> , Alvina Gul<sup>1</sup> , Rabia Amir<sup>1</sup> , Abid Subhani<sup>3</sup> , Zubair Ahmed<sup>4</sup> , Zahid Mahmood<sup>4</sup> , Xianchun Xia<sup>2</sup> , Awais Rasheed2,5,6 \* and Zhonghu He2,5 \*

<sup>1</sup> Atta-ur-Rehman School of Applied Biosciences (ASAB), National University of Science and Technology (NUST), Islamabad, Pakistan, <sup>2</sup> Institute of Crop Sciences, National Wheat Improvement Centre, Chinese Academy of Agricultural Sciences (CAAS), Beijing, China, <sup>3</sup> Barani Agriculture Research Institute (BARI), Chakwal, Pakistan, <sup>4</sup> Crop Science Institute, National Agricultural Research Centre, Islamabad, Pakistan, <sup>5</sup> International Maize and Wheat Improvement Centre (CIMMYT), CAAS, Beijing, China, <sup>6</sup> Department of Plant Sciences, Quaid-i-Azam University, Islamabad, Pakistan

### Edited by:

Dragan Perovic, Julius Kühn-Institut, Germany

#### Reviewed by:

Vesna Kandic, Maize Research Institute Zemun Polje, Serbia Sivakumar Sukumaran, International Maize and Wheat Improvement Center, Mexico Gwendolin Wehner, Julius Kühn-Institut, Germany

#### \*Correspondence:

Awais Rasheed arasheed@qau.edu.pk; a.rasheed@cgiar.org Zhonghu He zhhecaas@163.com

#### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 30 January 2019 Accepted: 15 May 2019 Published: 04 June 2019

#### Citation:

Khalid M, Afzal F, Gul A, Amir R, Subhani A, Ahmed Z, Mahmood Z, Xia X, Rasheed A and He Z (2019) Molecular Characterization of 87 Functional Genes in Wheat Diversity Panel and Their Association With Phenotypes Under Well-Watered and Water-Limited Conditions. Front. Plant Sci. 10:717. doi: 10.3389/fpls.2019.00717 Modern breeding imposed selection for improved productivity that largely influenced the frequency of superior alleles underpinning traits of breeding interest. Therefore, molecular diagnosis for the allelic variations of such genes is important to manipulate beneficial alleles in wheat molecular breeding. We analyzed a diversity panel largely consisted of advanced lines derived from synthetic hexaploid wheats for allelic variation at 87 functional genes or loci of breeding importance using 124 high-throughput KASP markers. We also developed two KASP markers for water-soluble carbohydrate genes (TaSST-D1 and TaSST-A1) associated with plant height and thousand grain weight (TGW) in the diversity panel. KASP genotyping results indicated that beneficial alleles for genes underpinning flowering time (Ppd-D1 and Vrn-D3), thousand grain weight (TaCKX-D1, TaTGW6-A1, TaSus1-7B, and TaCwi-D1), water-soluble carbohydrates (TaSST-A1), yellow-pigment content (Psy-B1 and Zds-D1), and root lesion nematodes (Rlnn1) were fixed in diversity panel with frequency ranged from 96.4 to 100%. The association analysis of functional genes with agronomic and biochemical traits under well-watered (WW) and water-limited (WL) conditions revealed that 21 marker-trait associations (MTAs) were consistently detected in both moisture conditions. The major developmental genes such as Vrn-A1, Rht-D1, and Ppd-B1 had the confounding effect on several agronomic traits including plant height, grain size and weight, and grain yield in both WW and WL conditions. The accumulation of favorable alleles for grain size and weight genes additively enhanced grain weight in the diversity panel. Graphical genotyping approach was used to identify accessions with maximum number of favorable alleles, thus likely to have high breeding value. These results improved our knowledge on the selection of favorable and unfavorable alleles through unconscious selection breeding and identified the opportunities to deploy alleles with effects in wheat breeding.

Keywords: drought tolerance, functional markers, kompetitive allele-specific PCR markers, synthetic-derivatives, marker-trait association

## INTRODUCTION

fpls-10-00717 June 2, 2019 Time: 12:15 # 2

Insight into genetic loci that have been selected during modern wheat breeding is of significant importance to understand the phenotypic variation in modern wheat cultivars. This will enable wheat breeding to transit to knowledge-based activity and will ultimately improve the rate of genetic progress in wheat (Li H. et al., 2018). The major limitations, however, are the: (i) unavailability of full spectrum of functional genes involved in wheat adaptability, (ii) sequencing resources in large diversity panels of wheat, and (iii) high-throughput genotyping platforms for gene diagnostics. Gene discovery and map-based gene cloning in wheat has lagged behind other crops, such as rice and maize, due to the slow progress in developing sequence resources due to the large and complex wheat genome. Consequently, gene cloning in the Triticeae (wheat and its related species) was reliant to some extent on comparative genomics approaches with other grass genomes due to the high collinearity and genetic organization among grass genomes (Li H. et al., 2018). Until now, more than 150 functional markers are available for important genes in wheat (Liu et al., 2012), and were subsequently converted into high-throughput KASP assays (Rasheed et al., 2016a). Such as, genes related to grain size and weight (Nadolska-Orczyk et al., 2017), developmental traits like photoperiod response and vernalization (Kamran et al., 2014), and end-use quality. More specifically genes for grain size and weight including TaTGW6 (Hanif et al., 2015; Hu et al., 2016), TaCwi-A1 (Ma et al., 2012), TaSus2-2B (Jiang et al., 2011), TaSus2-2A, TaSus1-7A (Hou et al., 2014), TaGW2-6A, 6B (Su et al., 2011; Yang et al., 2012; Qin et al., 2014), TaCKX6-D1 (Zhang et al., 2012), TaSAP1-A1 (Chang et al., 2013), TaGS-D1 (Zhang et al., 2014; Ma et al., 2016), and TaGASR-A1 (Dong et al., 2014) have been cloned in wheat using comparative genomics approaches.

The favorable alleles of some genes are not very well surveyed in wheat germplasm from different regions and genetic backgrounds. Such studies hold promise to find opportunities to deploy specific favorable alleles using molecular markers. Similarly, those genes favoring adaptability of wheat under drought stress are extremely important for breeding purpose (Zhang et al., 2008). Drought is currently leading threat for global food security which is getting more severe with climate change following drier environment (Acuna-Galindo et al., 2014). Drought affects wheat at all developmental stages, including reproductive and grain filling stages which are the most sensitive phases of crop development, resulting in significant yield loss (Li C. et al., 2018). Therefore, deployment of genes favoring drought adaptability in selection breeding is a key strategy for sustainable wheat production (Thapa et al., 2018).

Synthetic hexaploid wheat (SHW) is an excellent source to transfer genetic variation for cultivar improvement. It combines genes from wild ancestor Aegilops tauschii (Coss.) and tetraploid wheat (Triticum turgidum L.), which improves bread wheat through the identification and introgression of useful genes via synthetic derivatives (SYN-DER) or advanced derivatives synthetic backcross derived lines (SBLs). It has been concluded during past few years that 30% yield advantage under drought stress could be attributed to the use of SHW (Tang et al., 2015). The favorable alleles associated with yield related traits retained in SBLs are the reason of yield benefits in SHW (McIntyre et al., 2014). However, the proportion of favorable alleles of major genes retained in SYN-DER is largely unknown, and analysis of allelic effects of these genes could be helpful in deployment of certain genes related to developmental traits and drought adaptability (Afzal et al., 2017).

The objectives of current study were to (i) analyze the allelic variations of functional genes underpin developmental traits, end-use quality, disease resistance, drought tolerance and agronomic traits, (ii) assess the allelic effects of these genes in SYN-DER diversity panel under well-watered (WW) and water-limited conditions (WL), and (iii) identify the genotypes with maximum favorable alleles or haplotypes under drought stress conditions.

### MATERIALS AND METHODS

### Germplasm and Phenotypic Evaluation

A diversity panel of 213 advanced lines derived from synthetic hexaploid wheats (SHWs) and elite bread wheat cultivars was used in the present study (**Supplementary Table S1**), comprising 42 bread wheat genotypes and 171 SYN-DERs. SYN-DER lines were developed by crossing synthetic lines with advanced lines and improved cultivars from CIMMYT and Pakistan.

The diversity panel was planted in the field in Barani Agriculture Research Institute, Chakwal (72.8◦ E, 32.8◦ N; 575 m a.m.s.l.) and National Agriculture Research Centre, Islamabad, Pakistan (73.04◦ E, 33.68◦ N; 620 m a.m.s.l.) following an alpha lattice design. Each genotype of two 2-m long rows was sown with two replications. Thirty viable seeds were sown for each row with the help of small plot grain drill for each row. Physiological, agronomic and biochemical traits have been measured according to our previous publication (Afzal et al., 2017).

For both well-water treatment and water limited treatment, inter-row spacing of 30 cm has been maintained. This cropping season was screened for 2 years in two environmental conditions. One is well water condition (WW) and the other is water limited condition (WL) which is semiarid and rainfed area lies at start of Potohwar plateau. For well-watered condition, three irrigations were given and soil moistures have been maintained until harvest. For water limiting conditions, polyethylene tunnel is used to provide shelter from precipitation in water limiting environment. The water seepage was avoided by 1 m deep ditch surrounding the tunnel and irrigation was stopped at the end of tillering stage.

Phenotypic traits recorded at each location included chlorophyll content index (CCI), total chlorophyll (Chl), canopy temperature (CT), number of grains per spike (GS), grain yield (GY), plant height (PH), proline, relative water contents (RWC), shoot dry weight (SDW), shoot fresh weight (SFW), spike length (SL), superoxide dismutase (SOD), sugar contents (Sugar), thousand grain weight (TGW), and tillers per plant

(TP). Specific suffixes were provided to each trait according to the water treatment, e.g., GYWW (GY in well-watered treatment) and GYWL (GY in water-limited treatment). The detailed phenotyping methodology for biochemical traits is provided earlier (Afzal et al., 2019).

### Genotyping

DNA was extracted from all genotypes using a modified CTAB method (Dreisigacker et al., 2013). Allele-specific KASP markers for 87 different loci were used (**Supplementary Table S2**). The primer sequences and amplification conditions of each gene are described in **Supplementary Table S2**. KASP assay to genotype targeting SNP in SYN-DER was developed following standard KASP guidelines. Briefly, primers were designed carrying standard FAM tail (5<sup>0</sup> -GAAGGTGACCAAGTTCATGCT-3<sup>0</sup> ) and HEX tail (5<sup>0</sup> -GAAGGTCGGAGTCAACGGATT-3<sup>0</sup> ), with targeting SNP at the 3<sup>0</sup> end. Primer mixture included 46 µl ddH2O, 30 µl common primer (100 µM) and 12 µl of each tailed primer (100 µM). Assays were tested in 384 well format and set up as 5 µl reaction [2.2 µl DNA (10–20 ng/µl), 2.5 µl of 2XKASP master mixture and 0.056 µl primer mixture]. PCR cycling was performed using following protocol: hot start at 95◦C for 15 min, followed by ten touchdown cycles (95◦C for 20 s; touchdown 65◦C– 1 ◦C per cycle 25 s) further followed by 30 cycles of amplification (95◦C for 10 s; 57◦C for 60 s). Extension step is unnecessary as amplicon is less than 120 bp. Plate was read in BioTek H1 system and data analysis was performed manually using Klustercaller software (version 2.22.0.5; LGC Hoddesdon, United Kingdom).

### Development of KASP Assays for TaSST Genes

Two KASP assays were developed for TaSST-D1 and TaSST-A1 genes underpin water-soluble carbohydrates (Dong et al., 2016). One KASP marker was developed for SNP A/G at position 1093 bp corresponding to CAPS marker WSC7D of TaSST-D1. Second KASP marker was developed for two neighboring SNPs GTT/ATA at positions 438 and 440 bp in TaSST-A2 gene. The results were compared to wheat cultivars in our previous publication (Dong et al., 2016).

### Statistical Analysis

Association analysis was performed in TASSEL version 5.0 using mixed linear model (MLM) which takes into account the K-PC model (Bradbury et al., 2007). This diversity has been genotyped with 90K SNP array (Afzal et al., 2019) and 500 unlinked SNP markers representing all 21 wheat chromosomes having minor allele frequency (MAF) value ranging from 0.3 to 0.5 were used for principal component analysis (PCA) and kinship information in TASSEL version 5.0. Both kinship matrix and first five principal components from PCA were used as co-variate to improve the statistical power. For the markers significantly associated with phenotypes in TASSEL, the student's t-test and Kruskal–Wallis test were further used to compare allelic effect means with

## RESULTS

In total, 124 KASP markers were used to identify alleles at 87 loci in SYN-DER diversity panel. The results described here are presented for the alleles or haplotypes as shown in **Table 1** at each locus, instead of presenting results for individual markers. Because in some cases, several KASP assays were used to identify multiple alleles at single locus. The alleles at 30 loci were fixed with relatively high frequency ranging from 95 to 100% in the diversity panel (**Table 2**). Such markers were not used for marker-trait association analysis. Those markers which were used for MTAs are shown in **Table 3**. In total, 58 loci showed allelic variation with minor allele frequency >5%. These loci underpin plant stature (n = 2), flowering time (n = 9), grain size and weight (n = 11), drought adaptability (n = 5), end-use quality (n = 18), grain color and dormancy (n = 7), and disease resistance (n = 6) (**Table 1**).

### Allele Fixation at Loci of Breeding Interest in SYN-DER

The alleles present in more than 95% of the accessions were referred to the fixed or highly selected alleles and are mentioned in **Table 2**. Alleles were referred to "favorable alleles" if positively associated with the improved phenotype in literature and referred to "unfavorable alleles" if the allelic effect is not favorable for improved phenotype. At Ppd-D1 locus, the photo-period insensitive allele, Ppd-D1a, was fixed with frequency of 99.6%. Similarly, Jagger-type allele at Vrn-D3 gene was present in 99.04% accessions which is associated with short-vernalization requirement. The Cadenza-type and wild-type alleles associated with delayed flowering were identified at TaElf3-B1 and TaElf3-D1, respectively. The unfavorable alleles associated with low TGW were detected at TaSus2-2B, TaGW2-6A, and TaTGW6-4A, while Hap-L associated with low grain numbers per spike at TaMoc1-A1 were fixed in the diversity panel. The favorable alleles associated with low yellow-pigment contents (YPC) were fixed at Psy-B1 and Zds-D1. The susceptible allele associated with Fusarium head blight gene (Fhb1) were fixed in diversity panel. Similarly, none of the accessions carried resistance alleles stem rust resistance gene (Sr36/Pm6), leaf rust resistance genes (Lr37, Lr9, Lr67, and Lr47) and eye-spot resistance gene (CU8). However, resistance allele for root lesion nematode resistance gene (Rlnn1) was present in 97.1% accessions (**Table 2**).

### Allelic Variation at Loci of Breeding Interest in SYN-DER

Several genes had more than two alleles, and only those genes are described here which have at least two alleles with more than 5% frequency. The minor allele frequency at 58 loci ranged from 5% (several alleles) to 47.6% (1fehw3). Allelic variation at these loci are described below according to their phenotypic associations (**Table 1**).

#### TABLE 1 | Allele frequency of functional genes in the diversity panel derived from synthetic hexaploid wheat.


(Continued)

#### TABLE 1 | Continued

fpls-10-00717 June 2, 2019 Time: 12:15 # 5


### TABLE 1 | Continued

fpls-10-00717 June 2, 2019 Time: 12:15 # 6


<sup>a</sup>Mode, Major allele/allele with the highest frequency.

### Wheat Adaptability and Developmental Related Genes

In total, 11 loci were categorized in this group (**Table 1**). At both Rht genes, wild-type alleles, Rht-B1a and Rht-D1a, were frequently present in 50.9 and 77.4% accessions, respectively. 1BL.1RS translocation was observed in 55 (26.4%) accessions. At photo-period response related genes, the photo-period insensitive alleles, Ppd-A1a and Ppd-B1a, were identified in 61.1 and 75.5% accessions, respectively. Sixteen accessions (7.7%) have GS-105 type Ppd-A1a allele with a 1,117 bp deletion in Ppd-A1 and is likely to be transferred from the durum parents of synthetic hexaploid wheat (**Table 1**). Across three vernalization genes, the spring-type alleles had high frequency at Vrn-A1 (45.2%), Vrn-B1 (86.1%), and Vrn-D1 (88.9%). The KASP assay TaBradi2g14790 was used to identify deletion of Elf3-D1 gene associated with early flowering, and 46.1% of the accessions have gene deletion. Similarly, two paralogs of Ppd1, PRR73-A1, and PRR73-B1 genes, were genotyped, and Hap-I was frequent (57.7%, 94.3%, respectively) at both loci (**Table 1**). Hap-I at PRR73-A1 is associated with early flowering, while Hap-I at PRR73-B1 is associated with delayed flowering (Zhang et al., 2016).

### Grain Size and Weight Related Genes

In this category, 10 functional genes showed higher allelic variations, out of which two genes (TaSus1-7A and TEF-7A) had more than two alleles. At TaGS-D1, the favorable allele, TaGS-D1a, associated with high TGW was most frequent (82.2%). Similarly, higher frequency of favorable alleles was observed at TaCwi-A1 (84.1%), TaSus2-2A (55.3%), TaSus1-7A (86.5%), TaGS5-A1 (79.8%), TaGW2-6B (61.5%), TaGS2-B1 (63.9%), and TaGS1a (58.1%), whereas unfavorable alleles associated with lower TGW were observed at TaGASR (88.9) and TEF-7A (65.8%) (**Table 1**).

### Drought Adaptability Related Genes

Five genes related to drought adaptability showed higher allelic variation in SYN-DER. The favorable alleles, Hap-4A-C and TaDreb-B1b, showed higher frequency at TaCwi-4A and TaDreb1 loci. Similarly, the favorable allele COMT-3Ba associated with high lignin contents under water-limited conditions was identified in 55.2% accessions. Almost equal allele frequency was observed at 1fehw3 locus related to watersoluble carbohydrate (WSC) contents in SYN-DER diversity panel (**Table 1**).

### Pre-harvest Sprouting and Grain Color Related Genes

Seven genes related to grain color and dormancy showed higher allelic variation which also include three homeologous genes TaMyb10 at A-, B-, and D-genomes. The alleles encoding white grain color were predominant at TaMyb10-A1 and TaMyb10-D1 loci, while red grain color encoding allele was predominant at TaMyb10-B1. The Vp1-B1c allele at Vp1-B1 (69.2%), Rio-Balnco-type


TABLE 2 | Alleles of functional genes completely or pre-dominantly fixed in the diversity panel derived from synthetic hexaploid wheat.

allele at Phs1 (75.9%), and Zen-type allele at TaMFT-A1 (85%) associated with pre-harvest sprouting tolerance were present with higher frequency (**Table 1**). At TaSdr-B1, the pre-harvest sprouting susceptibility allele, TaSdr-B1b had higher frequency.

### End-Use Quality Related Genes

Two loci encoding high-molecular-weight glutenin subunits had relatively high frequency of A×1 (27.4%) and A×2 ∗ (40.3%) at Glu-A1 and D×5+Dy10 (50.4%) at Glu-D1 which are associated with strong gluten contents and superior bread-making quality attributes. WBM is another newly identified bread-making quality gene, however, only 34 (16.3%) of the accessions have the favorable allele. Three major loci underpinning grain texture had high frequency of alleles associated with hard grain texture at Pina-D1 (84.6%) and Pinb-D1 (63.4%), whereas allele frequency was higher for soft grain texture at Pinb-B2 (57.2%). Low YPC is a desirable trait for Chinese noodle and steamed bread for a brighter white color, whereas high YPC is favored for bread, yellow alkaline noodles, pasta, and some other products for higher content of carotenoids, and the alleles associated with YPC including Psy-A1b, Psy-D1a, and Zds-A1a

were identified in 37.5, 94.7, and 53.8% of the accessions, respectively (**Table 1**).

### Biotic Stress Resistance Genes

The frequency of adult-plant resistance gene Lr34/Yr18 and Sr2 was 54.3 and 3.3%, respectively. Similarly, two other leaf rust resistance genes Lr14a and Lr21 were observed in 34.1 and 4.8% accessions, respectively. The alleles associated with virus resistance, SbmP, and soil born disease, Cre8, were detected in 9.6 and 22.6% of the accessions, respectively (**Table 1**).

### Genetic Diversity in the Diversity Panel Based on Functional Genes

Genetic diversity was estimated in the diversity panel based on the functional markers (**Figure 1A**). The accessions were categorized into synthetic-derivatives (those having Ae. squarrosa in their pedigree) and bread wheat advanced lines. The first two principal components explained 8.8 and 6.3% of the total variation, respectively. Most of the bread wheat accessions were separated on the PC2, and some were admixture within SYN-DER clusters. The phylogenetic tree corroborated the PCA analysis (**Figure 1B**), where some bread wheat accessions were



MAF, Minor allele frequency; P, P-values; R<sup>2</sup> , Phenotypic variation explained by marker; Estimate, Allelic effect of minor allele.

in a distinct cluster, and remaining bread wheats clustered together with SYN-DER.

### Allelic Effects of Functional Genes in SYN-DER

All the KASP assays showing minor allele frequency >5% were used for marker-trait associations (MTAs) in the diversity panel using agronomic and biochemical traits under WW and WL conditions (**Table 1**). To avoid false associations, population structure matrix based on 500 unlinked SNP markers was used as co-variate. However, a relaxed criterion based on P < 0.05 was used to declare MTAs (**Supplementary Table S3**). Based on this criterion, 128 MTAs were observed, out of which 55 were associated with traits under WL conditions, 94 were associated with traits under WW conditions and 21 were associated across WW and WL conditions (**Table 3**). At stricter criterion of P < 0.01 the number of MTAs reduced to 24 under WL, 39 in WW and 3 across both water conditions. These include Rht-D1, TaGS1a, Ppd-B1, and Vrn-A1 associated with TGW (**Supplementary Table S3**).

Five KASP assays for Ppd1 homeologous genes were significantly associated with days to heading (DH), grain yield (GY), relative water contents (RWC), spike length (SL), and TGW. For Ppd-A1 gene, GS105-type Ppd-A1a was associated with SL in WL condition and GY and spikelets per spike (SpPS) in WW conditions. The paralog of Ppd1 gene, PRR73-A1 was associated with DH and DM (Days to maturity) in WL conditions, GpS in WW condition, and tiller numbers (TN) in both conditions (**Supplementary Table S3** and **Table 3**). The newly identified Elf3-D1 and TaMOT-D1 genes were associated with DEM, DH, DM, and HI in WL condition, while TaMOT-D1 was also associated with DH and DM in WL condition.

The KASP assays for TaSus1-7A were associated with TGW and GY in WW condition, and SL in both conditions (**Table 3** and **Figures 2**, **3**). Wheat cell wall invertase gene, TaCwi-4A, was associated with DH and TGW in both water conditions, and GpS, PH, and proline in WW conditions (**Table 3** and **Figures 2**, **3**). Drought tolerance related gene such as 1fehw3 was associated with GY, PH, and SL in WW conditions. Dreb1 was associated with canopy temperature (CT) and GpS in WW condition and HI in WL condition (**Table 3**). Surprisingly 1BL.1RS representing wheat rye translocation was associated with GpS, GY, SL, SOD, SpPS, and TGW in WW condition, however, no MTA was identified under WL conditions (**Supplementary Table S3**). Rht-B1 was associated with GY, PH, and SL in WW condition (**Supplementary Table S3**) and GpS in both water conditions (**Table 3**). Rht-D1 was associated with CT, GpS, RWC, and SOD in WW conditions (**Supplementary Table S3**), DEM and EL in WL-conditions (**Supplementary Table S3**), and GY, PH, and TGW in both water conditions (**Table 3** and **Figure 3**). The KASP assay for TaSST-4D developed in this study was associated with biomass (BM) and TGW in WL conditions.

Several MTAs were identified for biochemical traits which provided novel insight into the confounding effect of functional genes on several biochemical traits and enzymatic activities induced by drought stress. Most importantly, TaSus1-7B was associated with superoxide dismutase (SOD) and TaCwi-A1

was associated with total soluble sugar contents (SS) under WL conditions. A graphical genotyping approach was used to visualize the number of favorable alleles in 25 accessions in SYN-DER panel having the highest grain yield under WW (**Figure 4A**) and WL (**Figure 4B**) conditions.

### DISCUSSION

The use of high-throughput KASP markers for functional genes provided valuable insight into the genetic architecture of synthetic-derived wheats for productivity, end-use quality, and disease resistance. This also helped to identify the favorable and unfavorable alleles of important breeding traits that are exhaustively selected and provide further opportunities to manipulate those alleles for wheat improvement. It had been challenging to practice molecular breeding in wheat despite the discovery and knowledge of huge array of functional genes for a range of important breeding traits (Liu et al., 2012). This was largely due to absence of high-throughput genotyping platform that can align with the breeding program to screen large breeding populations without compromising flexibility (Rasheed et al., 2017, Rasheed and Xia, 2019). Genomic studies using high-throughput genotyping assays like KASP had made it possible to genotype large populations at various loci within very short time (Rasheed et al., 2016a). Several recent studies used KASP markers to identify the allelic variation of functional genes in wheat cultivars from China (Rasheed et al., 2016a), United States (Grogan et al., 2016), and Canada (Perez-Lara et al., 2017).

Flowering time is one of the most important developmental traits for wheat adaptability and yield stability in target environments. Three genes controlling vernalization (VRN1), photoperiod response (Ppd1), and early flowering (Elf3) are known to be major determinants of flowering time optimization (Kamran et al., 2014). The alleles for spring growth habit were pre-dominantly observed at VRN1 loci which is attributed to selections of the superior genotypes carried out in absence of vernalization conditions in field trials in Pakistan. It is mandatory to have all three recessive loci for complete vernalization requirement; none of the accessions had the three recessive alleles across VRN1 homeologous loci. It has been known that VRN1, Ppd1, and Elf3 have the extended roles beyond flowering time (Kamran et al., 2014). VRN1 loci are known to increase number of spikelets (Whitechurch and Snape, 2003; Hailu and Merker, 2008), and final leaf number (Hay and Kirby, 1991). The association of allelic variation of VRN1 and Ppd1 homeologous loci with developmental traits like DH, DM, and yield related traits like GpS, GY, SL, SpPS, TGW, and TN was not unexpected. However, very little is known about the role of these genes under drought stress conditions, therefore our results have significant value for deployment of these genes under waterlimited conditions. The Vrn-A1 was significantly associated with GY and TGW, while Ppd-B1 was associated with TGW under WL conditions. Previously, it was demonstrated (Ogbonnaya et al., 2017), that winter-type VRN1 alleles significantly increased the heading time and decreased GY under heat stress conditions in bread wheat. Our results extended this information on the same role of VRN1 alleles in drought stress conditions. Similarly, Ppd1 roles are not confined to photo-period sensitivity, they also

control modifications leading to spikes with more elaborated arrangements and increase number of grains producing spikelets (Boden et al., 2015). The association of Ppd-A1 and Ppd-B1 with yield-related traits including spikelets per spike was important finding and confirmed the role of Ppd1 alleles in yield determination. The most important findings were that GS105 type Ppd-A1a alleles were retained in SYN-DER diversity panel at a relatively higher frequency and were associated with SL and spikelets per spike under WL conditions. These specific alleles are only present in durum wheat cultivars and are likely to have significantly higher expression for photo-period insensitivity. Because they are present in durum parents of synthetic hexaploid wheats, therefore this allele represented novel and potentially useful source of earliness in bread wheat. Because the diversity panel is fixed for major photo-period insensitive allele, Ppd-D1a, therefore these new variations from the durum source could help to further reduce flowering time in bread wheat. The Ppd-1 genes were associated with other agronomic traits like GY, SL, SpPS, TGW, and TN in the diversity panel. These results are in agreement with previous findings (Boden et al., 2015; Rasheed et al., 2016b). The PRR73 is a paralog of Ppd-D1 in bread wheat, and it was reported that accessions having Hap-I at PRR73-A1 and Hap-II at PRR73-B1 were earlier in heading and taller under long day conditions than accessions having contrasting haplotype (Diaz et al., 2012). The association of PRR73-B1 with PH and DH in SYN-DER diversity panel confirmed these finding, however, they were also associated with other traits under WL conditions. This indicated the importance of deploying PRR73 genes for drought tolerance by drought escape or maintaining assimilates during water-limited conditions.

Plant height is an important trait largely controlled by Rht-B1 and Rht-D1 genes. The important alleles Rht-B1b and Rht-D1b significantly reduce PH by 14–17%, decrease lodging, and increase harvest index (Rasheed et al., 2016a). The presence of Rht-B1b allele in 45% accessions and moderately high frequency of cultivar containing wild-type Rht-D1a allele was due to the fact that germplasm is derived from synthetic hexaploid wheats and mostly selected in drought conditions. The association of Rht1 genes with many adaptive traits like DH, GpS, GY, PH, RWC, SL, SOD, and TGW indicated the pleiotropic effects of these two genes. Previous studies have shown very broad roles of Rht genes having pleiotropic effect on anther extrusion, a major trait in hybrid wheat production (Würschum et al., 2018), resistance against Fusarium head blight (Gosman et al., 2009), insect pest resistance (Emebiri et al., 2017), and grain quality. Therefore, modulating plant height by selecting appropriate Rht alleles according to target environment is not only important for pure-line breeding but can also assist in hybrid wheat breeding where tallness of male is required for effective production of hybrids (Würschum et al., 2018).

Grain size and weight is an important grain yield component, which have not been significantly increased as compared to grain numbers in many parts of the world. This indicated significant potential to exploit this trait component of grain yield

in wheat. Until now, more than 15 genes have been cloned in wheat related to grain size and weight mainly using comparative genomics approaches and high gene co-linearity between wheat and other cultivated grass species like rice, maize, and barley (Nadolska-Orczyk et al., 2017). It is now well known through various experiments that drought stress at reproductive stage mainly restrict the spikelet fertility, thus reducing the number of grains (Ji et al., 2010), while drought stress at anthesis reduce the rate of resource mobilization from source to sink (grain) which ultimately reduce the grain size (Shen et al., 2003). Therefore, association of genes with grain size and grain number under WL provides further opportunities to use this information in developing drought tolerance varieties. In our panel, the favorable allele of TaCwi genes controls CWI enzyme which converts sucrose to glucose and fructose and expresses only in pollen, thus causes partial sterility in drought sensitive cultivars. As the both A- and D-genome homologs of TaCwi were positively selected in SYN-DER and associated with important agronomic traits, indicating the higher potential of SYN-DER in drought adaptability. Contrastingly, the unfavorable allele for another gene TaMoc-A1, a gene having an alleged role in spikelet development, was pre-dominant in the diversity panel and its favorable allele could be deployed for minor yield improvement under drought stress conditions. The other key genes either fixed in SYN-DER like TaGW2-6A, TaSus2-2B, and TaCwi-D1 or having balanced allele frequency at TaSus2-2A, TaGW2-6B, and

TaGS1a are very important to select accessions having maximum favorable haplotypes. It is likely that genes present on D-genome chromosomes like TaCwi-D1 (5D) and TaGS1a (7D) could have new alleles which remained undetected because we only used functional markers for very well-known alleles.

Dehydration responsive element binding proteins, DREB1, have been induced by water stress, low temperature and salinity (Zhang et al., 2009). In this diversity panel, TaDREB1 was associated with SL, GPS, and HI and was also previously associated with grain yield in Chinese wheats28. Similarly, Fructane 1 exohydrolase (1-FEH) is an ABA insensitive gene which is responsible for stable membrane and remobilizing water soluble carbohydrates (WSC) including fructan along with glucose and sucrose from stem to develop grains (Zhang et al., 2015). Rasheed et al. (2016a) developed a KASP marker for 1fehw3 gene, where Kauz-type allele has a very minor but significant effect on yield components, mainly TGW, in bread wheat (Howell et al., 2014). The presence of Kauz-type allele in 48% of the accessions and its association with GY, SL, and PH indicated the effectiveness to use this gene in wheat molecular breeding for drought tolerance.

The effect of drought on pollen fertility is irreversible and main cause of grain loss during WL conditions. Therefore, the storage carbohydrate accumulation in drought susceptible and tolerant cultivars depends on the genes for cell wall invertase, fructan biosynthesis genes in ovary and anthers (Shen et al.,

2003). Since these genes tightly control sink strength and carbohydrate supply, therefore deployment of favorable alleles of these genes could maintain pollen fertility and grain number in wheat. The drought tolerance in SYN-DER diversity panel can be mainly attributed to the presence of favorable genes underpinning cell wall invertase (TaCwi-A1 and TaCwi-D1) and enzymes related to remobilization of water soluble carbohydrates which ultimately strengthen the sink tissues during drought stress. The two KASP assays for TaSST-A1 and TaSST-D1 developed in this study are valuable tools for determining the water-soluble carbohydrates and their use in combination with 1fehw3 and TaCwi genes can enhance the selection accuracy of drought tolerant germplasm in marker-assisted breeding. Similarly, 1BL.1RS translocation also has significant yield advantage and has a positive impact on canopy water status (He et al., 2004). This translocation is widely used in breeding programs because it also provides resilience to biotic and abiotic stresses (Rasheed et al., 2016b). However, its positive or negative selection is based on the ultimate objective to develop cultivars, because this translocation is usually avoided due to the sticky dough characteristics of germplasm with 1BL.1RS translocation. 1BL.1RS translocation is present in 45% germplasm from CIMMYT, 22% from Turkey, 24% from China, 44% from Iran, 21% from Iran, and 17% from United States (Hailu and Merker, 2008).

The variability assessed for various quality encoding genes indicated the suitability of SYN-DER for variety of end-use quality characteristics. From results, it is clear that alleles associated with low PPO activity were higher at Ppo-A1 and Ppo-D1, which is a desirable characteristic. Low PPO activity is preferred for fresh Asian noodles, and therefore, selection for the alleles Ppo-A1b and Ppo-D1a is recommended in Chinese wheat breeding program (Wang et al., 2009). High yellow pigment content is favored for durum wheat pasta, but is considered undesirable for Chinese steamed bread and white noodles (Huang and Morrison, 1988; He et al., 2008; Wang et al., 2009). Therefore, selection for the Psy-A1b and Psy-B1c is encouraged. Among 217 Chinese wheat cultivars, 135 had Psy1-A1a and 82 had Psy1-A1b alleles. Similarly, the frequency of Psy-B1a was 86, Psy-B1b was 95, Psy-B1c was 34, and Psy-B1d was 2 (He et al., 2009). At Psy-D1, 191 genotypes were Psy1-D1a and only two were Psy1-D1g in 193 Chinese wheat cultivars and advanced lines (He et al., 2009), indicating much less genetic diversity presented at the Psy1-D1 locus.

A bright yellow color is desirable for yellow alkaline noodles, consumed in Japan and southeastern Asia; therefore, impairing LOX activity is desirable in wheat cultivars for use in these regions (Hessler et al., 2002). In China, however, a bright white to creamy color is required for Chinese style foods such as steamed bread and various Chinese noodles (He et al., 2008). Increasing LOX activity is therefore important in Chinese wheat breeding programs. In contrast, a bright yellow color is preferred for pasta, requiring a low LOX activity in durum wheat (Triticum turgidum L.) grains. Therefore, developing cultivars with lower LOX activity is an important objective in durum wheat breeding programs (Hessler et al., 2002; Carrera et al., 2007). Among these SYN-DER, almost all accessions have TaLox-B1a associated

with high LOX activity, and this allele is desirable for Chinese-style foods but not for yellow alkaline noodles. However, the choice of TaLox-B1a or TaLox-B1b by breeders will depend on the projected end-use products of the breeding programs. Similar trend was observed for zeta-carotene desaturase alleles on two different loci. Alleles at TaZds-D1 locus where TaZds-D1b alleles associated with low yellow-pigment content was fixed in diversity. In a previous study, TaZds-D1b allele was identified in four out of 217 Chinese varieties, while none of the advanced lines from CIMMYT carried this allele (Zhang et al., 2011). This indicated the TaZds gene at D-genome had very narrow genetic diversity in common wheat. The main limitation in comparing the diversity of such end-use quality related alleles is that most of the information is only available in Chinese wheat germplasm; hence it is important to reveal allelic variation for these genes in global wheat germplasm to build a knowledge based informatics resources in choosing appropriate candidates for wheat breeding.

Our work also included a comprehensive set of genes encoding biotic stress resistances to three rusts, powdery mildew, Fusarium head blight, and tan spot resistance. KASP assays for Lr21 were adopted from previous reports, and was only present in 10 SYN-DER. The Lr21 gene has exotic origin from Ae. tauschii. This indicated the potential to introduce these genes into wheat cultivars, as the leaf rust and stripe rust pathogens are widely avirulent to these genes globally (Sharma-Poudyal et al., 2013). Similarly, root lesion nematode resistance allele Rlnn1 was present in all accessions of the diversity panel.

The large-scale gene characterization in SYN-DER panel was very useful allele repertoire for selection of accessions with favorable alleles. Field screening under WL conditions enhanced its value to use this information for a wide range of breeding objectives. It was mainly possible due to the availability of high-throughput KASP assays for functional genes in wheat partially reported in Rasheed et al. (2016a). However, still many new KASP assays have been developed by our groups and yet to be reported. This would enable wheat researchers to select germplasm during wheat breeding carrying alleles of choice to improve selection accuracy. The combination of several genes would create a specific haplotype which is important and efficient selection criteria compared to selection based on single genes. Wheat accessions with more favorable combination of genes (haplotypes) could be selected visually by graphical genotyping approach and would improve the selection accuracy in wheat breeding (van Eck et al., 2017). The graphical genotyping approach was used to visualize favorable allele frequencies in top yielding accessions in WW and WL conditions (**Figure 4**), which indicated slightly high frequency of favorable alleles of flowering time and drought tolerance in accessions having

### REFERENCES


yield advantage under WL conditions. It is likely that several functional genes on D-genome chromosomes would carry novel alleles which remained undetected because we used KASP assays for well-known alleles at those loci. New approaches like target-enrichment sequencing of genes could be very effective for high-throughput SNP discovery in novel germplasm like SYN-DER derived from the wild species (Pankin et al., 2018).

The diversity panel was highly divergent for functional genes and this provided a set of target genes which could be manipulated to further fine-tune the expression of important agronomic traits. Several accessions were selected based on the combination of favorable alleles, i.e., SD89 has 30 favorable alleles and SD36 carries 27 favorable alleles of important traits. We also provided new insight on effects of functional genes for important biochemical traits under drought stress conditions which could be important for developing drought tolerant cultivars.

### AUTHOR CONTRIBUTIONS

AR, ZH, and XX designed the experiments. MK, ZM, FA, AG, and AS conducted the field experiments. MK and ZA did the genotyping work. AR analyzed the data. AR and MK wrote the manuscript. ZH, XX, and RA edited and reviewed the manuscript.

### FUNDING

This work was funded by the National Key Research and Development Programs of China (2016YFD0101802, 2016YFD0100502, and 2016YFE0108600), National Natural Science Foundation of China (31461143021), and CAAS Science and Technology Innovation Program.

### ACKNOWLEDGMENTS

We acknowledge financial assistance from Food Security Center (FSC), University of Hohenheim, Germany for Excellence South-South Scholarship to MK.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2019.00717/ full#supplementary-material

Afzal, F., Reddy, B., Gul, A., Khalid, M., Subhani, A., Shahzadi, K., et al. (2017). Physiological, biochemical and agronomic traits associated with drought tolerance in a synthetic-derived wheat diversity panel. Crop Past. Sci. 68, 213–224.

Boden, S. A., Cavanagh, C., Cullis, B. R., Ramm, K., Greenwood, J., Finnegan, E. J., et al. (2015). Ppd-1 is a key regulator of inflorescence architecture and paired spikelet development in wheat. Nat. Plants 1:14016. doi: 10.1038/nplants.20 14.16


with grain weight in common wheat (Triticum aestivum L.). Mol. Breed. 36, 1–11.



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Khalid, Afzal, Gul, Amir, Subhani, Ahmed, Mahmood, Xia, Rasheed and He. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Genetic Analysis and Transfer of Favorable Exotic QTL Alleles for Grain Yield Across D Genome Using Two Advanced Backcross Wheat Populations

#### Ali Ahmad Naz<sup>1</sup> \*, Said Dadshani<sup>1</sup> , Agim Ballvora<sup>1</sup> , Klaus Pillen<sup>2</sup> and Jens Léon<sup>1</sup>

1 Institute of Crop Science and Resource Conservation, Plant Breeding, University of Bonn, Bonn, Germany, <sup>2</sup> Institute of Agricultural and Nutritional Sciences, Plant Breeding, Martin Luther University of Halle-Wittenberg, Halle, Germany

### Edited by:

Kazuhiro Sato, Okayama University, Japan

#### Reviewed by:

Sivakumar Sukumaran, International Maize and Wheat Improvement Center, Mexico Marcelo Helguera, Instituto Nacional de Tecnología Agropecuaria (INTA), Argentina

> \*Correspondence: Ali Ahmad Naz a.naz@uni-bonn.de

#### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 31 January 2019 Accepted: 13 May 2019 Published: 04 June 2019

#### Citation:

Naz AA, Dadshani S, Ballvora A, Pillen K and Léon J (2019) Genetic Analysis and Transfer of Favorable Exotic QTL Alleles for Grain Yield Across D Genome Using Two Advanced Backcross Wheat Populations. Front. Plant Sci. 10:711. doi: 10.3389/fpls.2019.00711 Hexaploid wheat evolved through a spontaneous hybridization of tetraploid wheat (Triticum turgidum, AABB) with diploid wild grass (Aegilops tauschii, DD). Recent genome sequencing found alarmingly low genetic diversity and abundance of repeated sequences across D genome as compared to AB genomes. This characteristic feature of D genome often results in a low recombination rate and abrupt changes in chromosome, which are the major hurdles to utilize the genetic potential of D genome in wheat breeding. In the present study, we evaluated two advanced backcross populations designated as B22 (250 BC2F3:<sup>6</sup> lines) and Z86 (150 BC2F3:<sup>6</sup> lines) to test their yield potential and to enrich the D genome diversity simultaneously. The populations B22 and Z86 were derived by crossing winter wheat cultivars Batis and Zentos with synthetic hexaploid wheat accessions Syn022L and Syn086L, respectively. These populations were genotyped using SNP markers and phenotyped for yield traits in ten environments in Germany. Marker analysis identified lower recombination rate across D genome as compared to A and B genomes in both populations. Further, we compared the genotype data with the trait grain yield to identify favorable exotic introgressions from synthetic wheat accessions. QTL analysis identified seven and 13 favorable exotic QTL alleles associated with enhancement or at least stable grain yield in populations B22 and Z86, respectively. These favorable introgressions were located on all chromosomes from 1D to 7D. The strongest exotic QTL allele on chromosome 1D at SNP marker RAC875\_c51493\_471 resulted in a relative increase of 8.6% in grain yield as compared to cultivated allele. The identified exotic introgressions will help to refine useful exotic chromosome segments for their incorporation for improving yield and increasing D genome diversity among cultivated varieties.

Keywords: synthetic wheat, Aegilops tauschii, D genome, grain yield, exotic allele

### INTRODUCTION

fpls-10-00711 May 31, 2019 Time: 17:33 # 2

Bread wheat (Triticum aestivum L.) is a major cereal crop utilized as staple food worldwide. The production statistic of last decade showed a sign of yield stagnancy, which is posing a serious threat to the food security for exponentially growing human population (FAO, 2017). A primary reason behind this may lie in low genetic diversity due to the bottlenecks of domestication and intensive selection within the cultivated gene-pool (Tanksley and McCouch, 1997). Narrow genetic diversity does not simply reduce the number of useful alleles, but it can cause second order genetic dilemmas like low genetic recombination especially in self-pollinating crops like wheat. This scenario thus demands efforts to harness new genetic resources for improving yield potential as well as for broadening the genetic diversity of the cultivated wheat varieties.

Hexaploid wheat is allopolyploid (2n = 6x = 42; AABBDD) evolved through a spontaneous hybridization of tetraploid wheat (AABB) with Aegilops tauschii (DD). More than half a century ago, the pioneering work of Kihara (1944), Sears (1944), and McFadden and Sears (1946) found (syn. A. squarrosa) as the progenitor of D genome of the hexaploid wheat. Since that time valuable research was made to investigate the genetic diversity and genome divergence between the domesticated and undomesticated wheat species. For instance, Lagudah and Halloran (1988) made a phylogenetic relationship among A. tauschii population and found highly polymorphic gliadin (Gli-1) locus that provides fingerprint haplotypes for a given genotype. Similarly, glutenin (Glu-1) locus was studied comprehensively at gene and protein levels (Payne and Lawrence, 1983; Payne, 1987; Lagudah and Halloran, 1988; Anderson et al., 1989, 2003). These reports also revealed higher allelic variation of Glu-D1 locus in A. tauschii as compared to Glu-D1 locus in hexaploid wheat. Chantret et al. (2005) studied hardness (Ha) locus in polyploidy and diploid wheat species to investigate the pre- and post-polyploidization evolutionary differentiation of A, B, and D. They found the loss of puroindoline a and b (Pina and Pinb) genes from the hexaploid wheat as well as a 29 kb smaller Ha locus in the D genome of hexaploid wheat as compared to D genome of its diploid progenitor A. tauschii, putatively caused by illegitimate recombination. Using whole genome sequencing, Luo et al. (2017) made a comprehensive analysis of D genome in A. tauschii ssp. strangulata accession AL8/78. This analysis found more number of annotated genes in the progenitor of the wheat D genome A. tauschii as compared to the D genome of wheat cultivar Chinese Spring. These reports suggest that the wheat progenitor A. tauschii reveals rich genetic diversity due to its wide geographical distribution, but a limited lineage of A. tauschii was involved in the establishment of hexaploid wheat.

The recent whole genome sequencing of hexaploid wheat revealed an in-depth insight on the genetic potential of A, B, and D genomes and their linkages among each other. Till this end, around 35.345, 35.643, and 34.212 high confidence gene have been reported across the A, B, and D genomes, respectively (IWGSC, 2018). These data suggest almost an equal distribution of genes across the individual genomes. Although, the numbers of genes are not dramatically different across the A, B, and D genome of cultivated variety Chinese Spring, an alarmingly low genetic diversity and abundance of repeated sequences across D genome are reported as compared to A and B. This characteristic feature of D genome often results in low recombination rate and abrupt changes in chromosomes, which seem to be the major hurdles to utilize the genetic potential of D genome in wheat breeding. This scenario demands the addition of new genetic resources across the D genome of hexaploid wheat to improve the recombination rate and allelic variation simultaneously. Although, A. tauschii population's diversity and its implication on improving yield and yield related traits were highlighted in the past (Ogbonnaya et al., 2003, 2005), its utility and focus remained largely on the incorporation of useful alleles for improving biotic and abiotic stress tolerance traits among the cultivated wheat varieties.

In the present study, we employed an advanced backcross QTL analysis strategy to identify genome wide marker defined introgressions of A. tauschii associated to improving yield using a population of 400 BC2F<sup>3</sup> lines (Kunert et al., 2007). A direct utility of the A. tauschii diversity and favorable alleles for improving yield in bread wheat seem not feasible because of the linkage drag of additional exotic chromosomal segments on essential breeding traits. The advanced backcross populations were established by crossing winter wheat cultivars Batis and Zentos with synthetic hexaploid wheat accessions Syn022L and Syn086L, respectively.

### RESULTS

### SNP Markers and Their Distribution Across A, B, and D Genomes

The populations B22 and Z86 were genotyped using 15 and 90 k SNP-chip arrays, respectively. Overall, the highest numbers of SNP markers and marker density were found across B genome whereas the lowest numbers of SNP markers were found across D genome in both populations (**Figures 1A,B**). The population Z86 revealed remarkably higher numbers of SNP markers across A (4477), B (5239), and D (1334) genomes as compared to the population B22 (**Figure 1A**). The lowest marker density (0.46 marker per cM) was found across the D genome especially in the population B22 (**Figure 1B**). The percent recombination rate was lower across the D genome as compared to A and B in the population B22 and Z86 (**Figure 1C**).

### Identification of Favorable Exotic QTL Allele for Grain Yield Population B22

Marker by trait analysis revealed significant trait variation between the cultivated and synthetic alleles at seven SNP markers related to grain yield in population B22. These QTL effects were located on most chromosomes except on 5D. At the associated SNP loci, the relative increase in yield due to the introgression of the exotic alleles from synthetic wheat accession Syn022L, ranged from 2 to 4.6% (**Table 1**). The strongest exotic QTL allele

was associated with SNP marker wsnp\_CAP7\_c1735\_859875 on chromosome 6D at position 122.3 cM. Two additional exotic QTL allele that resulted in 3% increase in yield were localized on chromosomes 2D (SNP: Excalibur\_c7366\_1475 136.4 cM) and 3D (SNP: Excalibur\_c63483\_991 3.2 cM), respectively.

### Population Z86

In population Z86, 13 exotic QTL alleles showed improved relative performance for yield in comparison to cultivated alleles from cultivar Zentos. These QTL effects were located across the 1D–7D chromosomes (**Table 2**). The highest numbers of QTL effects were found on chromosome 7D where four exotic QTL alleles resulted in the improvement of yield as compared to cultivated alleles. The strongest QTL effects were associated with SNP markers RAC875\_c51493\_471 and RAC875\_c20675\_268 on chromosome 1D (162.3 cM) and 3D (148.4 cM), respectively. At these loci the introgression of exotic allele from synthetic wheat accession Syn086L accounted for more than 8% increase in yield. Similarly, the introgression of exotic alleles at SNP markers Kukri\_c2408\_784 (1D: 3.5 cM) and BS00010664\_51 (2D: 103.3 cM) showed 6.1 and 6.4% increase in yield than the corresponding cultivated alleles, respectively. On chromosome 5D at SNP BS00011794\_51 locus the introgression of exotic allele resulted in 5.7% enhance of yield. Also, an exotic QTL allele of almost similar magnitude was identified on chromosome 3D at SNP marker D\_contig14424\_524 (121.8 cM) that accounted for 4.9% increase in yield relative to cultivated allele from Zentos.

TABLE 1 | List of SNP markers associated to grain yield variation in the population B22.


<sup>1</sup>Significant SNP markers at the QTL effect selected by the highest F-value. <sup>2</sup>Chromosomal localization of the QTL effect. <sup>3</sup>Position of the listed marker in cM. <sup>4</sup>Probability of significance (P-value). <sup>5</sup>False discovery rate (FDR), ∗∗<0.01 and ∗∗∗<0.001. <sup>6</sup>Least square means (lsmeans) of trait values of all investigated BC2F<sup>3</sup> lines carrying the homozygous cultivated allele (CC) at the given marker locus. <sup>7</sup>Lsmeans of trait values of investigated BC2F<sup>3</sup> lines carrying the homozygous exotic (EE) allele at the given marker locus. <sup>8</sup>Relative performance of the exotic allele at the given marker locus calculated by: [(EE)-(CC)]∗100/(CC), where (EE) and (CC) are lsmeans of the homozygous synthetic (exotic) and the homozygous cultivated alleles at a given marker locus.

TABLE 2 | List of SNP markers associated to grain yield variation in the population Z86.


<sup>1</sup>Significant SNP markers at the QTL effect selected by the highest F-value. <sup>2</sup>Chromosomal localization of the QTL effect. <sup>3</sup>Position of the listed marker in cM. <sup>4</sup>Probability of significance (P-value). <sup>5</sup>False discovery rate (FDR), significance level ∗∗∗≤0.001. <sup>6</sup>Least square means (lsmeans) of trait values of all investigated BC2F<sup>3</sup> lines carrying the homozygous cultivated allele (CC) at the given marker locus. <sup>7</sup>Lsmeans of trait values of investigated BC2F<sup>3</sup> lines carrying the homozygous exotic (EE) allele at the given marker locus. <sup>8</sup>Relative performance of the exotic allele at the given marker locus calculated by: [(EE)-(CC)]∗100/(CC), where (EE) and (CC) lsmeans of the homozygous synthetic (exotic) and the homozygous cultivated alleles at a given marker locus.

### Chromosomal Localization of Favorable Exotic Alleles for Grain Yield Across D Genome

We have plotted the marker effects across the D genome to show the length and effect of the exotic QTL alleles for grain yield in the population B22 and Z86. For this, we compared the effect of each cultivated and exotic allele with the mean grain yield of the population B22 and Z86 as base line to show relative increase and decrease in grain yield associated to exotic and cultivated alleles across the D genome (**Figure 2**). This analysis in the population B22 showed that all chromosomes carried exotic QTL alleles which showed both positive and negative effects on grain yield (**Figure 2A**). Although, exotic QTL alleles revealed variation on all chromosomes, but no significant QTL effect was found on chromosome 5D in the population B22. Surprisingly, on chromosome 4D a region of cultivated allele was associated with decrease in yield. The chromosome 6D revealed the maximum number of positive exotic QTL alleles for grain yield. Among these, the exotic introgression on the long arm of 6D showed the highest increase where the strongest exotic QTL allele was detected for yield. Similarly, the population Z86 carried favorable exotic QTL alleles for yield from chromosome 1D–7D (**Figure 2B**). Comparatively higher numbers of exotic QTL alleles of relatively smaller intervals were identified in the population Z86 as compared to the population B22. The exotic QTL allele accounting for the highest increase in yield was identified on chromosome 1D. At this locus at the SNP marker RAC875\_c51493\_471 the exotic allele revealed the highest increase in grain yield.

### Distribution and Effects of Four Exotic QTL Alleles in Population B22 and Z86

QTL analysis identified 20 QTL where the performance of exotic alleles was higher or at least comparable to the cultivated allele in the populations B22 and Z86. Among these, we selected two of the strongest QTL alleles in each population B22 and Z86 for needle plot analysis to see population wide distribution and effect of exotic and cultivated alleles. The distribution of exotic and cultivated alleles in population B22 is presented in **Figure 3**, where blue and red bars on x-axis represent the BC2F<sup>3</sup> lines carrying cultivated and exotic alleles, respectively. The y-axis showed the grain yield in dt/ha. The strongest exotic QTL effect in population showed that the majority of BC2F<sup>3</sup> lines carrying exotic allele were distributed within the high yielding lines. Whereas four lines fall with the low yielding BC2F<sup>3</sup> lines (**Figures 3A,B**). In population Z86, the distribution of exotic alleles was clearer than B22, where only one line showed an outlier effect exotic allele (**Figures 3C,D**).

To compare, we presented the needle plots of two strongest negative exotic QTL alleles for grain yield in populations B22 and Z86. In population B22, the strongest negative exotic QTL alleles was detected at SNP locus Excalibur\_c51312\_218 (157.9 cM) on chromosome 3D (**Figure 4A**). Whereas, advanced backcross lines carrying exotic allele at marker locus tplb0053n05\_793 (12.3 cM) on chromosome 2D showed decrease in grain yield among the population Z86 (**Figure 4B**).

### DISCUSSION

The development of hexaploid wheat is one the nature's wonders by which three different grass species were hybridized. The genetic potential of wheat for yield is reducing due to its lower genetic diversity mainly as results of intensive breeding within the cultivated gene-pool. It has been found that the A. tauschii gene-pool (the progenitor of D genome) carried more genetic diversity as compared to the D genome in cultivated varieties (Ogbonnaya et al., 2005; Luo et al., 2017). These reports suggest that a limited gene-pool of A. tauschii was involved in the evolution and hybridization of hexaploid wheat. This scenario demands to revisit the genetic potential of A. tauschii natural populations to harness the useful variation in cultivated varieties. The issue of D genome diversity and its utility was raised in the past, but no significant efforts were made to enrich the useful genetic diversity at genome level. Till today, the utility A. tauschii was largely limited to the incorporation of specific alleles for biotic and abiotic stress tolerance traits (Schachtman et al., 1992; Dubcovsky et al., 1996; Huang et al., 2003; Dreccer et al., 2004a; Naz et al., 2008; Krattinger et al., 2009). The present study was aimed to test the utility of A. tauschii for complex traits like yield to add the useful diversity at genome level in the cultivated wheat varieties. For this we employed a population of 400 advanced backcross lines (BC2F3) carrying chromosomal segments of two exotic synthetic hexaploid wheat accessions in two cultivated

FIGURE 2 | QTL map for grain yield across the D genome in population B22 (A) and Z86 (B). In each plot X circle represents the chromosome, distribution, and density of SNP markers on the D genome from white (low density) to red (high density), Y circle illustrate the positive and negative effects of exotic alleles on grain yield relative to the population mean as reference line. The values +12 and –12% represented the positive and negative effect of exotic and cultivated alleles relative to the population mean as reference line, respectively, and Z circle shows a Manhattan plot from QTL analysis of grain yield showing significant QTL effects highlighted as red circles. The scale 0–30 represents the logarithm of odd (LOD) score.

wheat backgrounds. The synthetic hexaploid wheat accessions carried the original undomesticated D genome from A. tauschii and were established through the production of the amphiploids according to Lange and Jochemsen (1992a,b).

Genome wide SNP marker genotyping showed overall low genetic polymorphism across the D genome as compared to A and B genomes in both populations. These data in line with the characteristic feature of lower genetic diversity across the D genome in cultivated varieties like Chinese Spring (Luo et al., 2017). Recent whole genome sequencing of A. tauschii genome also reveals multiple impacts of transposons in determining variation in physical and genetic lengths as well as a faster evolution across the chromosomes of D genome as compared to cultivated wheat (Zhao et al., 2017). Luo et al. (2017) found remarkably more number of gene across the D genome of A. tauschii accession as compared cultivar Chinese Spring which suggest a putative chromosome shortening across the D genome of cultivated wheat. These scenarios may be linked with the variation in genetic recombination and abrupt changes (fast evolution) across the chromosomal arms (Akhunov et al., 2003). However, no reports were available yet to describe the extent of chromosome shortening across the D genome of cultivated varieties as compared to A. tauschii genotypes. Also, the number and density of SNP was remarkably low in the population B22 as compared to Z86. One major reason behind low marker density in population B22 was the use of 15 k SNP-array instead of 90 k SNP-array in the population Z86. To test this, the population B22 needs to be genotyped with similar 90 k SNP-array chip.

QTL analysis was made by comparing the grain yield data achieved across five environments with the genotype data, to identify useful allele for yield from synthetic wheat accessions especially across the D genome. We employed AB-QTL analysis strategy devised by Tanksley and Nelson (1996) for a straightforward detection and incorporation of the useful exotic

alleles back into cultivated varieties to enhance yield and diversity across the D genome simultaneously. No such population has been employed for the improvement of complex traits like grain yield in wheat till today. Our analysis found 20 exotic QTL alleles from synthetic wheat accessions which enhances or at least comparable to the cultivated alleles. The exotic QTL alleles showing high performance relative to cultivated alleles are of great significance for breeding yield, but it is worthy to mention that the exotic alleles showing marginal increase or comparable performance to the cultivated allele are – needed to enrich new genetic diversity without a linkage drag on yield. In the present analysis, the strongest exotic QTL for grain yield were found on chromosomes 1D, 2D, 3D, 5D, and 6D. Bennett et al. (2012) studied grain yield in a doubled haploid population, derived from a cross between RAC875 and Kukri and identified a Kukri allele at Q.Yld.aww-3D for higher yield. This QTL may correspond to the chromosomal position of exotic QTL on chromosome 3D detected in the present study. Earlier Cox et al. (1995) reported the possibility of yield increase in hexaploid wheat using A. tauschii backcross populations. Ogbonnaya et al. (2003) evaluated substitution lines under drought stress condition and found substantial variation among the lines where the substitution lines resulted on average 10–41% increase in yield than that of the Australian cultivars. However, the contribution of D genome in this variation remained unclear among the substitution lines. Hitherto, it was suggested that these substitution lines carried beneficial traits like increased capacity for water extraction during critical grain growth phase, vigorous root systems and increased root density relative to the cultivars (Dreccer et al., 2004b). Recently, Bhatta et al. (2018) performed GWAS in a population of synthetic accessions under drought stress conditions and identified 34 genomic regions associated to yield and yield related QTL across the D genome. In the present study, we detected a maximum number of 13 QTL alleles for grain yield across D genome in the population Z86. We found slightly low number of QTL as compared to previous as we employed an advanced backcross BC2F<sup>3</sup> crossing population which does not allow the mapping resolution like execution of GWAS among in natural populations. The numbers of QTL favorable exotic QTL alleles were lower in population B22 as compared to Z86. A reason behind this difference may lie in the resolution of genetic map employed in both populations. Also, the lower density of SNP markers in population B22 may cause lower magnitude of QTL effects as compared to population Z86.

To our knowledge, the present study is the first report that uncovers genome wide favorable exotic allele across the D genome in the cultivated background for their straightforward transfer in elite gene-pool. We are following the established genetic resources in our ongoing work and crossing the strongest exotic QTL alleles with the elite cultivars. From the resulting F1 plants a high resolution double haploid population will be established to refine the marker defined favorable exotic introgressions as well as to test the segregation and effect of additional exotic segments on yield and sustainability traits as A. tauschii populations are known for their diversity and adaptive fitness. Further, the established lines can directly be employed in the breeding yield, sustainability as well as enriching D genome diversity of wheat cultivated gene-pool using marker assisted selection and to uncover the role of genes in mediating yield and yield related traits.

### MATERIALS AND METHODS

### Plant Material

Two advanced backcross populations B22 (250 lines) and Z86 (150 lines) comprising of 400 BC2F3:<sup>6</sup> was established to conduct this research following the advanced backcross strategy of Tanksley and Nelson (1996). The crosses are derived from the two German winter wheat cultivars Batis and Zentos (T. aestivum L.) and the two synthetic, hexaploid wheat accessions Syn022 and Syn086 (Triticum turgidum spp. dicoccoides × T. tauschii, Lange and Jochemsen, 1992a,b). AB populations were designated as B22 (Batis x Syn022) and Z86 (Zentos x Syn086). The development of B22 and Z86 until the BC2F3:<sup>6</sup> generation is explained in detail in Kunert et al. (2007).

### Genotyping

fpls-10-00711 May 31, 2019 Time: 17:33 # 8

The populations B22 (250 BC2F3:<sup>6</sup> lines) and Z86 (150 BC2F3:<sup>6</sup> lines) were genotyped using 15 and 90 k iSelect SNP arrays, respectively. The chromosomal positions of SNP were assigned according to Wang et al. (2014). In addition, a DNA polymorphism survey was conducted between all four parents of the two crosses with a total of 488 SSR markers selected for an even coverage of all three wheat genomes. The chromosomal positions of the SSR markers were obtained from the consensus map of Somers et al. (2004). Percent recombination rate was estimated using the R package R/qtl (Broman et al., 2003).

### Phenotypic Evaluation of Grain Yield

Phenotypic evaluation of grain yield of the BC2F<sup>3</sup> populations B22 and Z86 was carried out under field conditions at five different locations in 2 years (10 environments) across Germany. In each test environment, the AB lines and their recurrent parents were grown in a single randomized block design, containing one plot of each AB line, 20 plots of Batis and 10 plots of Zentos. Net plot sizes (4.5–6.3 m<sup>2</sup> ), seed density (310–360 kernels/m<sup>2</sup> ) and field management were in accordance with local practice. From each location, grain yield was measured in one-tenth of a ton per hectare calculated from weight of grain harvested per plot and designated as desi ton per hectare (dt/ha).

### QTL Analysis

QTL analysis was carried out with SAS version 9.4 (SAS Institute, 2015). The detection of QTLs was carried out using the mixed hierarchical model:

$$Y\_{ijk} = \mu + M\_i + E\_j + M\_i \times E\_j + \varepsilon\_{k(ij)}, \dots$$

where µ is the general mean, M<sup>i</sup> is the fixed effect of the i-th marker genotype, E<sup>j</sup> is the random effect of the j-th environment, M<sup>i</sup> × E<sup>j</sup> is the random interaction effect of the i-th marker genotype with the j-th environment and εl(ij) is the error of Yijk. At each marker locus only the homozygous genotypes were included in the calculation, because the repeated selfing of heterozygous genotypes leads to a mix of both homozygous genotypes in the derived BC2F3:<sup>5</sup> and BC2F3:<sup>6</sup> field plots, resulting in a false estimate of the performance of true heterozygous genotypes (Pillen et al., 2003). Markers detecting the same significant effect were combined to a single

### REFERENCES


QTL, if linked with ≤20 cM. The relative performance of the homozygous exotic genotype [RP(EE)] was calculated as described by Pillen et al. (2003).

### Circos Diagrams and Needle Plots

The software package Circos was applied to create circular plots with chromosomal ideograms (Krzywinski et al., 2009). The length of the chromosomes and the position of the markers were taken from Wang et al. (2014). For illustration of marker density bins with 5 cM genetic distance were constructed.

For construction of the needle plots for each displayed marker the average phenotypic value of the genotypes containing the synthetic allele or cultivar allele were calculated and plotted against the y-axis. The genotypes were sorted according the highest phenotypic value.

### AUTHOR CONTRIBUTIONS

AN, JL, and KP conceptualized the research. AN, KP, AB, and SD did phenotyping and genotyping of population B22 and Z86. AN, KP, AB, SD, and JL made the data analyses. AN, AB, and SD wrote the manuscript.

### FUNDING

This work was funded by the German Plant Genome Research Initiative (GABI) of the Federal Ministry of Education and Research (BMBF, project 312862) and by the Federal Ministry of Agriculture and Nutrition (Grant # IdMaRo-100203349).

### ACKNOWLEDGMENTS

We would like to thank the cooperating plant breeders, Dr. E. Kazman (Saatzucht Josef Breun), Dr. J. Schacht (Limagrain-Nickerson), Dr. E. Ebmeyer (Lochow-Petkus), and Dr. A. Spanakakis (Fr. Strube Saatzucht) and their teams for carrying out the field experiments. We further appreciate the assistance of Dr. A. Kunert, W. Bungert, and H. Rehkopf at the Research Station Dikopshof, as well as P. Kerwer, O. Dedeck, S. Gehlen, and C. Golletz for their technical assistance in the laboratory. We would also like to thank Mrs. Woitol for her valuable support in the preparation of this manuscript.

high-molecular-weight glutenin genes from the D-genome of a hexaploid bread wheat, Triticum aestivum L. cv Cheyenne. Nucleic Acids Res. 17:461. doi: 10. 1093/nar/17.1.461


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Naz, Dadshani, Ballvora, Pillen and Léon. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

fpls-10-00711 May 31, 2019 Time: 17:33 # 9

# A Genome-Wide Association Study of Highly Heritable Agronomic Traits in Durum Wheat

Shubin Wang<sup>1</sup> , Steven Xu<sup>2</sup> , Shiaoman Chao<sup>2</sup> , Qun Sun<sup>3</sup> , Shuwei Liu<sup>1</sup> \* and Guangmin Xia<sup>1</sup> \*

<sup>1</sup> Key Laboratory of Plant Development and Environmental Adaptation Biology, Ministry of Education, School of Life Sciences, Shandong University, Qingdao, China, <sup>2</sup> United States Department of Agriculture-Agricultural Research Service (USDA-ARS), Cereal Crops Research Unit, Edward T. Schafer Agricultural Research Center, Fargo, ND, United States, <sup>3</sup> Department of Plant Sciences, North Dakota State University, Fargo, ND, United States

#### Edited by:

Pierre Sourdille, INRA Centre Auvergne-Rhône-Alpes, France

#### Reviewed by:

Anna Maria Mastrangelo, Research Centre for Industrial Crops, Council for Agricultural Research and Economics, Italy Marcelo Helguera, Instituto Nacional de Tecnología Agropecuaria (INTA), Argentina

\*Correspondence:

Shuwei Liu liushuwei@126.com Guangmin Xia xiagm@sdu.edu.cn

#### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 12 February 2019 Accepted: 28 June 2019 Published: 17 July 2019

#### Citation:

Wang S, Xu S, Chao S, Sun Q, Liu S and Xia G (2019) A Genome-Wide Association Study of Highly Heritable Agronomic Traits in Durum Wheat. Front. Plant Sci. 10:919. doi: 10.3389/fpls.2019.00919 Uncovering the genetic basis of key agronomic traits, and particularly of drought tolerance, addresses an important priority for durum wheat improvement. Here, a genome-wide association study (GWAS) in 493 durum wheat accessions representing a worldwide collection was employed to address the genetic basis of 17 agronomically important traits and a drought wilting score. Using a linear mixed model with 4 inferred subpopulations and a kinship matrix, we identified 90 marker-trait-associations (MTAs) defined by 78 markers. These markers could be merged into 44 genomic loci by linkage disequilibrium (r <sup>2</sup> > 0.2). Based on sequence alignment of the markers to the reference genome of bread wheat, we identified 14 putative candidate genes involved in enzymes, hormone-response, and transcription factors. The GWAS in durum wheat and a previous quantitative trait locus (QTL) analysis in bread wheat identified a consensus QTL locus.4B.1 conferring drought tolerance, which was further scanned for the presence of potential candidate genes. A haplotype analysis of this region revealed that two minor haplotypes were associated with both drought tolerance and reduced plant stature, thought to be the effect of linkage with the semi-dwarfing gene Rht-B1. Haplotype variants in the key chromosome 4B region were informative regarding evolutionary divergence among durum, emmer and bread wheat. Over all, the data are relevant in the context of durum wheat improvement and the isolation of genes underlying variation in some important quantitative traits.

Keywords: wheat, durum wheat, agronomic traits, drought tolerance, genome-wide association study, evolutionary divergence

### INTRODUCTION

The bulk of the wheat cropped across the world is represented by either hexaploid bread wheat (Triticum aestivum ssp. aestivum) or tetraploid durum wheat (Triticum turgidum ssp. durum). The latter shares two of the former's three sub-genomes, both of which evolved from wild emmer wheat (T. turgidum ssp. dicoccoides) by way of cultivated emmer wheat (T. turgidum ssp. dicoccum; reviewed by Faris, 2014). However, durum and bread wheat are distinct in aspects of diverse phenotypic features, due to the genetic divergence resulting from their independent domestication

**198**

and difference in ploidy level. The production of durum wheat is dwarfed by that of bread wheat, but remains nevertheless of considerable economic importance, as its grain is better suited than that of bread wheat for the manufacture of pasta, couscous and other semolina products.

Substantial research efforts have been devoted to determining the genetic basis of yield-related traits in bread wheat, such as plant height, heading date, morphological aspects with flag leaf, panicle and grain (e.g.; Quarrie et al., 2006; Kuchel et al., 2007; Kumar et al., 2007; Gegas et al., 2010; Wang et al., 2011; Wu et al., 2012; Würschum et al., 2017, 2018). However, the extent of this effort in durum wheat has been much more limited (e.g., Maccaferri et al., 2008; Peleg et al., 2009; Golabadi et al., 2011; Roncallo et al., 2017). The USDA-ARS National Small Grain Collection (NSGC, Aberdeen, ID, United States) conserves over 8,000 entries and represents the global diversity of durum wheat (Chao et al., 2017). A core subset has been assembled from this collection, comprising 493 entries with spring growth habit, of which 235 are classed as landraces, 77 as breeding lines, 55 as released cultivars, leaving 126 of unknown breeding status (Aoun et al., 2016; Chao et al., 2017). The size of this panel is appropriate for conducting genome-wide association studies (GWAS), a method which has been applied with some success to reveal the genetic basis of some key agronomic traits in bread wheat (e.g., Liu Y. et al., 2017; Sun et al., 2017; Würschum et al., 2017, 2018). The application of GWAS in durum wheat has to date been more limited, with most of these studies focused on geographically specific diversity and associated with limited sample size (e.g., Maccaferri et al., 2010; Mengistu et al., 2016; Kidane et al., 2017; Soriano et al., 2017; Mangini et al., 2018; Sukumaran et al., 2018).

The durum crop is typically rain-fed and frequently exposed to moisture deficiency in most of the durum-growing areas (Habash et al., 2009). For example, about 65% of the durum growing areas in the Mediterranean, which account for about 40% of global durum cultivated areas, is distributed in arid and semi-arid land with low rainfall below 350 mm (Nachit, 1994; Bassi and Sanchez-Garcia, 2017). Since the early 1990s, durum production in the United States has gradually moved from humid areas in the east (e.g., eastern North Dakota, United States) to the dry areas in the west (e.g., western North Dakota and eastern Montana, United States) due to severe outbreaks of Fusarium head blight (North Dakota Wheat Commission<sup>1</sup> ). Drought stress can adversely affect grain yield of the wheat crop at various growth stages from germination to grain filling (Kizilgeçi et al., 2017). Drought at wheat early development stage can especially inhibit seedling vigor, tillering and root development, thus reducing biomass accumulation and grain yield (Kizilgeçi et al., 2017; Zhao et al., 2019). Compared with bread wheat with winter-type varieties dominating globally, modern durum varieties around the world are mostly spring type or semi-winter type (Matsuo, 1994), typically with 3– 4 months of growth periods. In such a short life cycle, drought stress at the seedling stage probably reduces grain yield more seriously.

As drought stress is the major limiting factor for global durum production (Grant et al., 2012), any enhancement in drought tolerance of durum wheat would represent a significant contribution to food security and farmer's income. Breeding for drought tolerance (DT) is conventionally achieved by selection for yield in a target environment (Sukumaran et al., 2018), but success is hampered by the poor heritability of yield, difficulties in assuring homogeneity in the environment and the importance of genotype by environment interactions (Habash et al., 2009). A potentially attractive alternative strategy for DT selection is based on indirect traits, such as water use efficiency, leaf water content, leaf senescence, and root architecture, for which variation can be correlated with grain yield. Several such traits have been identified (Gupta et al., 2017).

Here, a durum panel with 493 entries was phenotyped with respect to a number of important morphological characters and a drought-related trait, and the resulting data merged with an extensive SNP (single nucleotide polymorphism)-based genotypic data set to conduct a GWAS, focusing on DT. The analysis has provided valuable information to understand the major genetic components of important agricultural traits and a more detailed haplotype and candidate gene of DT in durum.

### MATERIALS AND METHODS

### Plant Materials

The line information and genotype data of the 493 durum wheat entries were reserved in the T3/Wheat database<sup>2</sup> . This panel has previously been exploited to identify the genetic basis of resistance against certain foliar pathogens (Aoun et al., 2016; Chao et al., 2017). In addition, genotypic data from a domesticated emmer wheat (Sun, 2015) and a bread wheat (Cavanagh et al., 2013) diversity panel was used to infer the genetic divergence of identified quantitative trait loci (QTL). All three diversity panels were genotyped using the wheat 9k iSelect assay (Cavanagh et al., 2013). Two F<sup>2</sup> populations were developed to verify partial trait-associated loci or markers detected by GWAS. For each population, the parental lines (PI520392/PI210912 and PI191571/CITR14814) were selected from the durum diversity panel and, for each population, 120 F<sup>2</sup> plants coupled with their F<sup>3</sup> progenies were sampled for linkage analysis.

### Evaluation of Agronomic Traits

The durum wheat panel was evaluated for the following 18 morphological traits: plant height (PH), heading date (HD), plant waxiness (WX), glume pubescence (GP), glume color (GC), flag leaf length (FLL), flag leaf width (FLW), flag leaf length/width ratio (FLR), flag leaf angle (FLA), panicle length (PL), spikelet number per spike (SN), panicle compactness (PC), single kernel weight (KW), grain length (GL), grain width (GW), grain length/width ratio (GR), grain projection area (GA) and factor form density (FFD).

<sup>1</sup>https://www.ndwheat.com/buyers/NorthDakotaWheatClasses/Durum/

<sup>2</sup>https://triticeaetoolbox.org/wheat/

Of the 18 traits, 13 (the exceptions were WX, FLL, FLW, FLR, and FLA) were measured in 2012 and 2013 at North Dakota State University (NDSU, Fargo, ND, United States) research site at Prosper (46.9630◦N, 97.0198◦W; soil type: Perella-Bearden silty clay loam complex), and in 2014 at Shandong University research site (SDU, Jinan Shandong, China; 36.6489◦N, 117.0290◦E; soil type: brown loam). Fielded trails at the NDSU research site in 2012 (planting date: April 12) and 2013 (planting date: May 08) applied an unduplicated augmented design with four checks in each block (single-row plots, 3.2 m long with rows 30 cm apart) and no irrigation during the crop growing season. Average annual rainfall and snowfall in Fargo are documented as 573.53 and 914.40 mm, respectively (U.S. Climate Data<sup>3</sup> ). The SDU field experiment (planting date: March 02) applied a randomized block design with four replicates (single-row plots, 2.0 m long with rows 30 cm apart) with no irrigation during the crop growing season. The annual average rainfall in Jinan was 521.20 mm. For the two F<sup>2</sup> populations, their F<sup>3</sup> families were planted at 2015 and 2016 growth season at Jinan using a randomized block design with three replicates.

The other five traits were evaluated only in the Jinan experiment. GP, GC, WX, and FLA were scored on a 1–9 scale. The traits' value, except that involving grain morphology, represented the mean performance of five plants per replicate or trial. PC was calculated as the ratio of FN/PL. The grain morphology related traits were measured based on 30 uniform seeds. For the evaluation of GL, GW, and GA, 30 grains of each entry per replicate or trial were imaged and the resulting digital images were analyzed using software "ImageJ<sup>4</sup> ." FFD was calculated from the expression KW/GL<sup>∗</sup> GW, by following Gegas et al. (2010).

### Assessment of Drought Tolerance

For each entry, 15 uniform seedlings were planted in a soil-filled plastic cone, 6.4 cm in diameter at the open end and 25.4 cm in depth (total volume: 656 mL). Water was withheld once the plants had reached the three-leaf stage. The plants were re-watered until approximately 2/3 entries were wilted. Two days after rewatering, a drought wilting score was assigned using a 1–6 scale, where score 1 represented a completely wilted plant, 2 a plant in which the first three leaves were wilted, 3 a plant in which only the first two leaves were fully wilted and the third leaf was partially wilted, 4 a plant in which the first two leaves were fully wilted and the third leaf was not wilted, 5 a plant in which the first leaf was fully wilted, the second partially wilted and the third not wilted, and 6 a plant in which only the first leaf was wilted.

### Statistical and Bioinformatic Analyses

Summary statistics for each trait value was inferred in R (Ihaka and Gentleman, 1996). All trait values were fitted with a linear mixed model [trait ∼ line + 1|year + 1|location + 1|(block %in% location:year)] in R package Lme4<sup>5</sup> . The broad sense heritability was calculated as the ratio of total genetic variance to total phenotypic variance, and the best linear estimators were used for GWAS. The consensus linkage map of tetraploid wheat (Maccaferri et al., 2015) and the IWGSC RefSeq v1.0 genomic assembly (International Wheat Genome Sequencing Consortium, 2018<sup>6</sup> ) were applied to assign a genomic location to each SNP marker. Pairwise linkage disequilibrium (LD, r 2 ) was calculated using the package "genetics" in R (see text footnote 5). The program "Structure<sup>7</sup> " was used to assign ancestry proportion for each durum wheat entry based on a set of 366 unlinked markers (pairwise LD < 0.1, MAF > 0.05). The admixture model was fitted by varying the K parameter from 2 to 10. Markertrait associations (MTAs) between each of the 18 morphological traits and DT and the set of informative SNPs were detected by fitting the mixed linear model (MLM) implemented in TASSEL software (Bradbury et al., 2007), treating the first three principal components (PCs) and HD as co-variates and using family relatedness as a random effect. A Bonferroni approach based on 168 haplotype blocks calculated by software Haploview<sup>8</sup> was used to adjust the initial p, which resulting a –log10(p) = 3.53 equal to a false discovery rate (FDR) = 0.05. At the same time, a two-step approaches combining single- and multiple-locus LMM model was used to claim significant MTA (Wang et al., 2016). All MTAs satisfying either criterion were included in the final result. The haplotype network was inferred using TCS software (Clement et al., 2000). For linkage analysis in the F<sup>2</sup> populations, dCAPS markers were developed based on the flanking sequences of the corresponding SNP sites (**Supplementary Table S1**). QTL analysis was carried out in R package R/qtl<sup>9</sup> .

### RESULTS

### High Genetic Polymorphism and Long LD Decay in Global Durum Collection

To examine the genetic polymorphism of the durum panel, we assessed the allelic variation of 8,136 SNP markers fixed in the wheat 9k iSelect assay (Wang et al., 2014). The SNP genotyping revealed signals at 6,538 SNP loci. A considerable accumulation was observed toward low polymorphic markers, in particular, about 32% of the detected markers had a minor allelic frequency (MAF) of less than 0.05 (**Figure 1A**). This was likely because of ascertainment bias, arising from the preponderance of bread wheats (as opposed to durum wheats) in the panel used to detect informative SNPs (Cavanagh et al., 2013). Overall, a set of 4,369 informative markers (MAF > 0.05 and missing value < 30%) were selected for this study. According to marker position, the polymorphic markers covers 90% of the genomic region, with a mean inter-marker distance of 2.6 Mb (**Figure 1B**). In this map, the only significant gaps present were in the vicinity of some of the centromeres. An analysis of pairwise LD showed that 42% of the marker pairs were associated with a p-value below 0.001,

<sup>3</sup>https://www.usclimatedata.com/climate/casselton/north-dakota/united-states/ usnd006

<sup>4</sup>https://imagej.nih.gov/ij/

<sup>5</sup>https://cran.r-project.org/package=genetics

<sup>6</sup>https://wheat-urgi.versailles.inra.fr/Seq-Repository/Assemblies

<sup>7</sup>https://web.stanford.edu/group/pritchardlab/structure.html

<sup>8</sup>https://www.broadinstitute.org/haploview/haploview

<sup>9</sup>https://cran.r-project.org/package=qtl

of which 38% recorded an r 2 value of less than 0.1. Particularly high levels of LD were observed between markers separated by less than 5 Mb (**Figure 1C**). Based on the suggestion of Hill and Weir (1988), the decay in LD to 0.2 occurred within 9.6 Mb (**Figure 1D**), which is relatively smaller compared with previously reported LD (e.g., Bassi et al., 2019) in durum, mostly due to the large size of this panel. The high coverage of SNP polymorphism across the durum genome indicated an appropriate robustness but reduced resolution for GWAS.

### High Relatedness of Durum Wheat Accessions Independent From Breeding Status and Geographic Distribution

Population structure might bring false positive or negative marker-trait associations (MTAs) which should be considered as covariates in GWAS analysis. The population structure of the durum wheat panel based on the full set of informative SNPs has been previously described (Aoun et al., 2016; Chao et al., 2017). Due to the possible impact of linkage on the inference of genetic relationships, the genetic stratification issue was revisited by running an analysis using a sub-set of unlinked markers (pairwise LD < 0.2). A high pairwise relatedness among entries was established, with ∼80% of the comparisons producing an identity-by-state proportion of 0.6–0.8 (**Figure 2A**). The inferred genetic admixture and cluster assignment of the 493 entries suggested a structure comprised of two ancestral populations (**Figures 2B,C**) in agreement with the earlier analysis. A smaller peak in 1K was also observed at K = 4. At both K = 2 and 4, the panel was highly heterogeneous with respect to both breeding status and geographical provenance, and harbored a large proportion of admixed entries: in the two sub-populations, respectively, 29 and 47% of the entries were associated lines with a Q-value of <0.7 (**Supplementary Table S2**).

## Phenotypic Variation for Agronomic Traits and DT

For assessment of drought tolerance, we first compared the drought wilting score with other seedling drought

syndromes and DT index of yield components (**Supplementary Figure S1**). The drought wilting score is highly correlated with seedling drought syndromes (r <sup>2</sup> = 0.73 with leaf water content, 0.36 with chlorophyll content, 0.46 with rolling index, and 0.64 with survival rate), and moderately correlated with some of the yield-related DT index (r <sup>2</sup> = 0.26 with thousand seed weight, 0.05 with tiller number, and 0.07 with seed number per panicle). Therefore, this value could be a reflection to durum DT especially for early developmental stage.

The durum wheat panel displayed a broad range of phenotype with respect to each of the traits (**Table 1** and **Supplementary Figure S2**). The broad-sense heritability varied from 45 to 92 for the 17 quantitative traits – the exceptions were GP and GC, which were qualitative rather than quantitative in nature. A summary of the inter-trait correlations is given in **Figure 3**. The highest correlations involved pairs of traits within a single category, namely flag leaf, spike or grain morphology, but a few, mostly in the low to intermediate range, were recorded between unrelated traits. In particular, both PH and HD were


TABLE 1 | Variation displayed by the entries of the durum wheat panel with respect to 18 morphological traits and DT (a measure of drought tolerance).

SD, standard deviation; CV, coefficient of variation; H2, broad sense heritability; PH, plant height; HD, heading date; FLL, flag leaf length; FLW, flag leaf width; FLR, flag leaf length/width ratio; FLA, flag leaf angle; PL, spike length; SN, spikelet number per spike; PC, spike compactness; KW, single kernel weight; GL, grain length; GW, grain weight; GR, grain leaf length/width ratio; GA, grain projection area; FFD, factor form density; WX, waxiness; GP, glume pubescence; GC, glume color; DT, drought tolerance.

correlated with a considerable number of other traits, while DT was positively correlated with both WX and PC, and negatively with both PH and PL.

With respect to seven of the agronomic traits, the elite breeding lines and/or cultivars differed as a group from the landraces (**Table 2**), reflecting the effect of intensive selection pressure. The elite materials scored lower for PH, HD, FLL, and FLA, and higher for PL and SN; however, there was no such differentiation with respect to any of the grain morphology traits. The DT score averaged over the set of breeding lines exceeded that of landraces (3.59 versus 2.95). However, as a group, the mean performance of PH and DT in the cultivars was inferior to that of the landraces.

### The Genetic Basis of Agronomic Traits and DT Revealed by GWAS

The GWAS results are summarized in **Supplementary Table S3**. The quantile-quantile (QQ) plots (**Supplementary Figures S3, S4**) revealed that most of traits fitted the model well, although FLW was over-corrected. The genomic distribution of marker-trait-associations is shown in **Figure 4**. For DT, as an example, the Manhattan plot, quantile-quantile plot and haplotype blocks are given in **Figure 5**. In all, the GWAS identified 90 marker-trait-associations (MTAs) covering all but one of the traits (the exception was PL). The proportion of the phenotypic variation explained (PVE) by each MTA varied from ∼2–6%. The highest number of MTAs per trait (14) was associated with SN, followed by 13 for DT and 12 for FLL; each of the remaining traits was associated with fewer than 10 MTAs. The set of MTAs was defined by 78 markers, of which 67 accounted for a single trait, 10 for two traits and one for three traits. The 78 markers could be further merged into 44 loci according to their chromosome location and LD (r <sup>2</sup> > 0.2), 10 of

Traits Breading status P-value Landrace Breeding line Cultivar Unkown PH 99.42 a 84.10 b 101.04 a 103.88 a 0.00 ∗∗∗ HD 181.21 a 180.51 b 180.85 a 182.78 c 0.01 <sup>∗</sup> WX 6.47 a 6.85 a 6.71 a 6.39 a 0.28 FLL 25.68 a 24.54 b 25.95 a 26.53 c 0.03 <sup>∗</sup> FLW 1.58 a 1.53 a 1.61 a 1.61 a 0.48 FLR 17.12 a 16.70 a 17.64 a 17.38 a 0.60 FLA 4.79 a 4.12 b 4.58 c 4.72 c 0.02 <sup>∗</sup> PL 8.07 a 8.13 b 8.19 b 8.57 c 0.02 <sup>∗</sup> SN 18.28 a 18.59 a 18.90 b 19.76 c 0.00 ∗∗∗ PC 2.32 a 2.33 a 2.37 a 2.35 a 0.78 GP 3.36 a 2.33 a 2.00 a 3.00 a 0.03 <sup>∗</sup> GC 3.84 a 2.82 a 2.33 a 3.00 a 0.02 <sup>∗</sup> KW 52.58 a 51.49 a 53.00 a 51.20 a 0.15 GL 7.73 a 7.53 a 7.57 a 7.56 a 0.02 <sup>∗</sup> GW 3.25 a 3.27 a 3.29 a 3.25 a 0.35 GR 2.39 a 2.31 a 2.30 a 2.33 a 0.02 <sup>∗</sup> GA 19.73 a 19.31 a 19.58 a 19.30 a 0.14 FFD 2.09 a 2.09 a 2.12 a 2.08 a 0.22 DT 2.95 a 3.59 b 2.68 a 2.90 a 0.00 ∗∗∗

TABLE 2 | Analysis of variance with respect to 18 morphological traits and DT among the entries of the durum wheat panel according to their breeding status.

Trait abbreviations as given in Table 1. Means differing significantly from one another (<sup>∗</sup> , ∗∗∗p < 0.01, 0.001) have been assigned a different lower case letter.

which were relevant to more than one trait. The number of loci detected and the total PVE highly varied, where more loci were associated with SN (12), HD (6), DT (5), FLL (5), GL (5) and total PVE from 16 to 39%; and other traits were less associated with less than five loci and total PVE from 3 to 14%.

A set of markers used to define nine regions accounted for variation in more than one of the various grain morphology traits, while the co-location of MTAs associated with two or more unrelated traits also occurred (**Supplementary Table S3**). For example, the region denoted locus.4A.2 harbored MTAs relating to DT, FLA, and SN. The other example is the locus.4B.1 harboring MTAs related to both DT and PH. With respect to plant stature, this may arise as a result of the nearby semi-dwarfing gene Rht-B1 (TraesCS4B01G043100, 30,861,382 to 30,863,247 bp) in the locus.4B.1 region (**Supplementary Table S4**). The alleles inherited from the more drought tolerant entries at the six markers which defined the locus.4B.1 region were all associated with a reduction in PH, although only IWA4854 exceeded the chosen significance threshold (P < 0.001).

### QTL or Candidate Genes Underlying GWAS Loci

Based on sequence alignment of markers to the reference pseudomolecules (International Wheat Genome Sequencing Consortium, 2018), the detected loci by GWAS were compared with previously reported QTL and/or major genes (**Supplementary Table S4**). The consensus loci included those on chromosomes 4B and 5A for PH, 2A and 5B for HD, 1A for GP, 5B for WX, 4A for FLA, 5B for PC, 3A and 6A for grain morphology traits, 3A and 4B for DT. However, only two of the underlying genes were verified in wheat, including Rht-1B for locus.4B.1 (PH) and Ppd-A1 for locus.2A.5 (HD). In addition, three MTAs, include IWA2023 on 3A for SN, GW, and GR, IWA2816 on 4A for SN, and IWA4363 on 7A for SN, GL, and GA, were validated through linkage mapping in two F2:<sup>3</sup> families with the parents selected from the association panel (**Supplementary Figure S5**). Results from the QTL analysis were highly comparable with that from the GWAS, indicating the GWAS results were quite reliable.

Candidate genes were analyzed through gene annotation combined with their homology to functionally characterized genes in model plants or other cereals, and also their expression pattern (**Supplementary Table S4**). A total of 14 putative candidate genes involved in enzymes, hormone-response, and transcription factors were identified. Some of them have been verified in wheat and other related phenotypes in model plants or other cereals.

The genetic region for locus.4B.1 was also reported to be associated with drought tolerance by Kadam et al. (2012), and this region corresponded to an ∼16 Mb genomic interval harboring a set of an estimated 117 genes (**Supplementary Table S5**). Of the 117 genes present in this segment, only six were transcriptionally affected by the drought treatment: TraesCS4B01G071500 (predicted to encode an α subunit of pyruvate dehydrogenase E1), three tandemly arranged genes TraesCS4B01G072100, TraesCS4B01G072200, and TraesCS4B01G072300 (WD40 family protein), TraesCS4B01G076400 (Adenine/guanine permease), and TraesCS4B01G077900 (gibberellin-regulated protein 1). Among the six genes, the 4D homeolog of the three WD40 protein coding genes was shown by Kong et al. (2015) to confer tolerance to exogenously provided abscisic acid, salinity stress and osmotic stress, suggesting that the WD40 genes might be candidates for DT of

locus.4B.1. Sequence analysis of the three WD40-coding genes indicated a C-T SNP present in the coding sequence of TraesCS4B01G072200, which lead to premature stop of translation. In addition, a mixed linear model was applied on this variation by taking the top markers from other DT-related loci, population structure and kinship in consideration, the result showed that the premature stop variation was significantly associated with the drought sensitive phenotype in the durum panel (**Figure 6**).

### DT-Related Haplotypes of Locus.4B.1 in Durum and Bread Wheat

As locus.4B.1 region was previously identified as a majoreffect DT-related QTL in bread wheat and also detected in this study, we suggest it is an important DT-related QTL in durum wheat. To gain a further insight into the genetic variation of this region and their relationship with DT, we conducted haplotype analysis to this region. The results showed a total of 9 haplotypes (hap1–9) in durum wheat, with hap9 carried by 84% of entries in the durum wheat panel (**Supplementary Tables S6, S7**). With respect to the DT index, two minor haplotypes, hap1 and hap2, were significantly tolerant for DT than the dominant haplotype hap9 and the other 6 minor haplotypes (**Figures 7A–C**). The frequency of hap1 and hap2 was noticeably higher in the improved materials (and particularly in the breeding lines) than in the landraces, implying a positive selection for these two haplotypes carried out by durum wheat breeders (**Figure 7C**).

A similar analysis was carried out in the emmer wheat and the bread wheat panel, respectively. In the emmer wheat panel, the most common haplotypes were hap9, hap7 and hap5, with hap8 present in just two entries. The bread wheat panel was dominated by hap1 (72%) and hap6 (15%), and added a further six minor haplotypes (hap10 through hap15) (**Supplementary Tables S6, S7**). A haplotype network established for the nine major haplotypes in the three diversity panels revealed a separation between two sets of haplotypes: the first grouped hap1, hap2, hap5, hap 6, hap7, and hap10, and the second hap3, hap8, and hap9 (**Figure 7D**). Two haplotypes of the locus.4B.1 region dominated the set of emmer wheats, with hap9 represented in entries of the southern sub-population while hap7 was carried by numerous entries of the northern sub-population of emmer wheat. As the dominant haplotypes hap1 and hap6 in bread wheat was more closely related to hap7, while durum wheat was dominated by hap9, we can predict that the durum wheats were more closely related to the southern sub-population emmer wheats while modern bread wheats were more closely related to the northern ones.

### DISCUSSION

### The Genetic Basis of Agronomic Traits and DT in Durum Wheat

In this study, we identified 44 chromosome loci associated with 17 agronomic traits and a drought wilting score using GWAS on a durum wheat panel. For a given trait, the total PVEs varied from 16 to 39%, illustrating that most of the traits were under polygenic control. Similarly, previous GWAS and QTL analysis has shown that most of the traits investigated here, including PH, HD, WX, flag leaf-, spike- and grain-associated traits are controlled by multiple genes, each contributing only modestly to the overall genotypic variance (Gegas et al., 2010; Zhai et al., 2016; Liu et al., 2018). Surprisingly, both GP and GC fell into this category, even though both traits are relatively simply inherited (Khlestkina et al., 2006). A possible explanation for this unexpected result, which was also observed in a similar analysis of the genetic basis of awn type in bread wheat (Liu Y. et al., 2017), is that GWAS imposes genetic stratification on the population, which masks the effect of major genes. Correlation between agronomic traits was widely observed in previous studies, which can be partially explained by the presence of pleiotropic genes (e.g., Liu Y. et al., 2017; Roncallo et al., 2017; Schulthess et al., 2017). Twelve of the 78 SNP markers (15%) and 10 of the 44 genomic loci (23%) were relevant to more than one trait, which confirms the frequent observation that genes responsible for variation of significant agronomic traits express extensive pleiotropy and/or linkage. Some recent examples have been documented in bread wheat and durum wheat (e.g., Liu Y. et al., 2017; Roncallo et al., 2017; Schulthess et al., 2017).

A number of the genomic sites associated with a trait coincided with the location of previously discovered QTL and/or major genes (**Supplementary Table S4**). The sites of two of the three loci associated with PH, for example, were consistent with that of the height reducing genes Rht-B1 (Peng et al., 1999) and Rht9 (Ellis et al., 2005). We found that the diversity of Rht9 in durum wheat was higher than that of Rht-B1, the dwarfing gene employed in bread wheat during the Green Revolution, suggesting that Rht9 might play more important roles than Rht-B1 in durum genetic improvement. HD or flowering time in wheat is dependent on the array of genes controlling a cultivar's requirement for vernalization (Vrn genes) and photoperiod (Ppd genes), in addition to a number of genes promoting earliness per se (Eps genes). A GWAS study has identified the dependence of bread wheat's flowering time on the identity of the Ppd-D1 allele present and on the gene copy number at Ppd-B1, with a number of minor effect loci also contributing (Würschum et al., 2018). We observed a positive GWAS signal coinciding with the location of Ppd-A1 rather than that of Ppd-B1, consistent with observations made by Maccaferri et al. (2010). There was no such signal coincident with any of the Vrn loci, presumably because none of the entries have any vernalization requirement. In all, durum wheat possessed more loci for controlling flowering time than bread wheat, and the combination of different loci enhanced the diversity of flowering time in durum, which is possibly helpful for adapting to the diverse environment of Mediterranean region. The MTAs mapped to sites that were consistent with the literature and included those on chromosomes 1A (GP), 5B (WX), 4A (FLA), 5B (PC), 3A and 6A (grain morphology traits), and 3A and 4B (DT). The discovery of a large number of noncoincident MTAs may simply reflect the extent of the divergence

between durum wheat and bread wheat, since the bulk of the genetic analyses reported in the literature relate to bread, rather than to durum wheat.

### Gene Candidature Under the GWAS Loci

The level of gene annotation currently available has allowed some tentative predictions to be made of the identity of the genes underlying variation in some traits (**Supplementary Table S4**). TraesCS2A01G540400, a putative GA3ox2 (gibberellin 3-oxidase 2) gene, was a primary candidate for the PH-related locus.2A.13. Its close homologs in maize (ZmGA3ox2/qPH3.1) and rice (OsGA3ox2/d18) have been verified to function for PH (Itoh et al., 2001; Teng et al., 2013). Potential candidates for the HD MTAs on chromosomes 2A are TraesCS2A01G269700 (flowering promoting factor-like 1). Previous reports in Arabidopsis and rice have highlighted the critical role of cytokinin oxidase (CKXs) to promote flowering or seed number (Werner et al., 2006). Potential candidates for the chromosomes 1B, 3B, and 7A SN MTAs are TraesCS1B01G176000, TraesCS3B01G344600, and TraesCS7A01G536900, respectively, all of which have been assigned as coding genes for CKXs. Finally, as aforementioned, the location of the DT MTA on chromosome 4B coincided with that of the tandem array of WD40-encoding genes. The present evidence indicated that genes homologous to that functionally characterized in model species or other cereal crops showed candidatures for some of the GWAS loci, suggesting they are good targets for more detailed analysis.

### Rare Variants of Locus.4B.1 Containing Candidate Genes Enhanced DT in Durum Wheat

Drought represents a major constraint over the yield of bread wheat (Fleury et al., 2010), and is of particular importance to durum wheat given that its prime areas of production are located in regions of low rainfall (Habash et al., 2009). Previous studies revealed that the region around the semidwarf gene Rht-B1 on chromosome arm 4BS associated with multiple traits responsible for tolerance to drought stress in bread wheat (Kadam et al., 2012). The QTL with major DTrelated effect was also found in the collinear chromosome regions in barley on chromosome 6H, rice on chromosome 1 and maize on chromosome 3 (Swamy et al., 2011). In this study, we confirmed the presence of a DT-related QTL (locus.4B.1) in this region downstream of Rht-B1 in durum wheat by GWAS.

Among the existing 9 haplotypes of locus.4B.1, only two rare ones (hap1 and hap2) are related to the DT phenotype in durum wheat (**Figure 7A**). Both DT haplotypes show potential for application in breeding DT durum cultivars as they are possessed

by only a limited number of entries (**Figure 7C**). In rice, breeding selection on SD1 significantly reduced qDTY.1.1 due to linkage drag (Vikram et al., 2015). However, there's no evidence that favorable alleles from Rht-1B and locus.4B.1 were linked in a repulsion fashion in durum. In contrast, DT alleles from all the six markers defining the locus.4B.1 also associated with reduced PH. This indicates a large proportion of the entries may possess dwarf-alleles and DT-alleles from each locus. The linkage between DT and reduced stature may provide an advantage for promoting these two traits together in breeding.

Three tandemly repeated WD40 genes under locus.4B.1 are suggested as primary candidate genes for DT according to three clues. First, each of the genes was significantly upregulated by exposure to drought; second, their 4D homeolog is known to confer tolerance to osmotic stress (Kong et al., 2015); and third, the members of the WD40 family have been identified as functioning in the plant stress response (Sharma and Pandey, 2016; Liu W.C. et al., 2017). In this work, a SNP resulting in a translational truncation in the second 4B WD40 (TraesCS4B01G072200) gene was found to be significantly associated with reduced DT (**Figure 6**). Furthermore, the drought-induced expression of TaWD40s was much depressed when combined with heat stress (**Supplementary Table S5**), revealing a potential relatedness to the enhanced stress effect when heat and drought stresses are integrated. Therefore, all above evidences indicate that the WD40 genes are associated with DT QTL in wheat and are worth further analysis.

### CONCLUSION

In this manuscript, we presented an insight into the worldwide genotypic and phenotypic diversity in durum wheat, as well as the genetic control for major agronomic traits and drought resistance. The highly phenotypic variation and correlation were shown to be controlled by a large number of genomic loci with multi-polygenic and pleiotropic effect. A few candidate genes were selected by comparable genomic analysis plus their expression pattern and genomic variation. In addition, the haplotype investigation in a DT-related loci locus.4B.1 showed potential application in breeding DT durum cultivars, as well as a complex evolutionary history and gene candidature. All these data are relevant in the context of durum wheat improvement and the isolation of genes underlying variation in some important quantitative traits.

### DATA AVAILABILITY

All datasets for this study are included in the manuscript and the **Supplementary Files**.

### AUTHOR CONTRIBUTIONS

This study was designed by GX, SL, and SW. The evaluation of traits was conducted by SW, SL, and SX. The emmer wheat panel was collected by SX. The genotypic evaluation of the durum wheat and emmer wheat panels was carried out by SC and QS. Data were analyzed by SW. The manuscript was drafted by SW and SL, and revised by GX. All authors have reviewed and approved the final version of the manuscript.

## FUNDING

This research was supported by grants from the National Natural Science Foundation of China (Nos. 31722038, 31720103910, and 31872864) and the Shandong University Young Scholars Program (2016WLJH39).

### ACKNOWLEDGMENTS

We thank Dr. Robert Koebner and Dr. Daryl Klindworth for careful reading and feedback on this manuscript.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2019.00919/ full#supplementary-material

FIGURE S1 | Inter-trait correlations among 8 drought related traits tested on 15 durum wheat lines. Pearson's correlation coefficients are shown in the lower panel and significant correlations (p < 0.01) are labeled with gray. Experimental repeatability is shown in parentheses after each trait in the diagonal. DT, drought wilting score; CC, chlorophyll content; RI, leaf rolling index; WC, leaf water content; SR, seedling survival rate; DSI, drought susceptibility index; TN, tiller number; SN, seed number per spike; GW, thousand grain weight. CC, RI, and DSI were measured according to Peleg et al. (2009). WC was estimated according to Kong et al. (2015).

FIGURE S2 | Frequency distribution for the 18 morphological traits and DT among the members of the durum wheat diversity panel.

FIGURE S3 | Qantile-quantile plots for the GWAS results from 9 agronomic traits.

FIGURE S4 | Qantile-quantile plots for the GWAS results from the other 9 agronomic traits.

FIGURE S5 | Verification of three MTAs through linkage mapping. QTL analysis was conducted in the F<sup>2</sup> population PI520392/PI210912 targeting MTAs IWA2023 on 3A for SN, GW, and GR, IWA2816 on 4A for SN (a,b) and in the F<sup>2</sup> population PI191571/CITR14814 targeting MTA IWA4363 on 7A for SN, GL, and GA (c). Markers corresponding to the MTA detected by GWAS were colored with red. PVE, phenotype variation explained by the peak marker.

TABLE S1 | Markers used in linkage analysis.

TABLE S2 | The summary of provenance, breeding status and population structure inferred by the admixture model in the STRUCTURE software package of the durum wheat diversity panel.

TABLE S3 | Markers significantly associated with traits identified by GWAS.

TABLE S4 | Reported QTL and candidate genes underlying GWAS loci.

TABLE S5 | The presumed function of candidate genes for the DT trait located in the 4B.1 region.

TABLE S6 | Haplotypes present in the 4B.1 region among the durum wheat, emmer wheat and bread wheat materials.

TABLE S7 | The distribution of haplotypes presents in the 4B.1 region among the durum wheat, emmer wheat and bread wheat materials.

### REFERENCES

fpls-10-00919 July 16, 2019 Time: 15:43 # 12


environments: from the QTL to candidate genes. J. Exp. Bot. 57, 2627–2637. doi: 10.1093/jxb/erl026


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Wang, Xu, Chao, Sun, Liu and Xia. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Detecting Large Chromosomal Modifications Using Short Read Data From Genotyping-by-Sequencing

*Jens Keilwagen1\*, Heike Lehnert1, Thomas Berner1, Sebastian Beier2, Uwe Scholz2, Axel Himmelbach3, Nils Stein3, Ekaterina D. Badaeva4, Daniel Lang5, Benjamin Kilian6, Bernd Hackauf7 and Dragan Perovic8*

*1 Institute for Biosafety in Plant Biotechnology, Julius Kuehn Institute, Quedlinburg, Germany, 2 Research Group Bioinformatics and Information Technology, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany, 3 Research Group Genomics of Genetic Resources, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany 4 Laboratory of Genetic Basis of Plant Identification, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia, 5 PGSB, Helmholtz Center Munich, Neuherberg, Germany, 6 Global Crop Diversity Trust, Bonn, Germany, 7 Institute for Breeding Research on Agricultural Crops, Julius Kuehn Institute, Quedlinburg, Germany, 8 Institute for Resistance Research and Stress Tolerance, Julius Kuehn Institute, Quedlinburg, Germany*

#### *Edited by:*

*Guijun Yan, University of Western Australia, Australia*

#### *Reviewed by:*

*Songlin Hu, Monsanto Company, United States Thomas Nussbaumer, Helmholtz-Gemeinschaft Deutscher Forschungszentren (HZ), Germany*

*\*Correspondence: Jens Keilwagen Jens.Keilwagen@julius-kuehn.de*

#### *Specialty section:*

*This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science*

*Received: 31 January 2019 Accepted: 16 August 2019 Published: 24 September 2019*

#### *Citation:*

*Keilwagen J, Lehnert H, Berner T, Beier S, Scholz U, Himmelbach A, Stein N, Badaeva ED, Lang D, Kilian B, Hackauf B and Perovic D (2019) Detecting Large Chromosomal Modifications Using Short Read Data From Genotyping-by-Sequencing. Front. Plant Sci. 10:1133. doi: 10.3389/fpls.2019.01133*

Markers linked to agronomic traits are of the prerequisite for molecular breeding. Genotypingby-sequencing (GBS) data enables to detect small polymorphisms including single nucleotide polymorphisms (SNPs) and short insertions or deletions (InDels) that can be used, for instance, for marker-assisted selection, population genetics, and genome-wide association studies (GWAS). Here, we aim at detecting large chromosomal modifications in barley and wheat based on GBS data. These modifications could be duplications, deletions, substitutions including introgressions as well as alterations of DNA methylation. We demonstrate that GBS coverage analysis is capable to detect *Hordeum vulgare/Hordeum bulbosum* introgression lines. Furthermore, we identify large chromosomal modifications in barley and wheat collections. Hence, large chromosomal modifications, including introgressions and copy number variations (CNV), can be detected easily and can be used as markers in research and breeding without additional wet-lab experiments.

Keywords: genebank, crop wild relatives, characterization and utilization of plant genetic resources, translocation, copy number variation (CNV), coverage, bioinformatics, breeding

### INTRODUCTION

Due to the progress in DNA sequencing, collections of plant species can be compared at the genomelevel to analyze the diversity within the collection. However, these analyses often resort to small differences of a few bases in the genome, as for instance, single nucleotide polymorphisms (SNPs), and seldom look at large chromosomal modifications of several kb or Mb.

In contrast, introgressions from crop wild relatives are substitutions or additions of large chromosomal regions and have been used to improve crop plants (Zamir, 2001; Dempewolf et al., 2017), e.g., as source of resistance or tolerance to biotic and abiotic stress in wheat (Rabinovich, 1998; Crespo-Herrera et al., 2017). Experimental methods, such as C-banding (Friebe et al., 1996), dot-blot genomic hybridization (Rey and Prieto, 2017), fluorescence *in-situ* hybridization (FISH) (Rayburn and Gill, 1986; Schneider et al., 2005), genomic *in-situ* hybridization (GISH) (Le et al., 1989; Schwarzacher et al., 1989) and acid or SDS-PAGE (Milovanović et al., 1998), are the state-of-the-art wet-lab techniques for detection and characterization of introgressions. However, these techniques are sophisticated and can only be handled by few labs. If specific markers are available, PCR-based methods can also be used to detect well-known introgressions (Ko et al., 2002).

Furthermore, some pipelines using bioinformatics were proposed in the last years. SNP data from GBS were used to identify introgressions provided that donor and parent plants are known (Wendler et al., 2014; Wendler et al., 2015). Alternatively, an introgression was identified by analyzing the coverage of whole-genome-sequencing data in tomato (Causse et al., 2013). In addition, Liu et al. (2004) found that foreign DNA introgression into a plant genome can induce extensive alterations in DNA methylation, which has been observed earlier in animals (Heller et al., 1995). DNA methylation was also discussed in a wider context of genomic immunity in plants with respect to transposons silencing (Kim and Zilberman, 2014). Hence, methods using methylation-sensitive restriction enzymes might be able to identify genomic regions harboring alien introgressions. Methods like amplified fragment-length polymorphism (AFLP) using a methylation-sensitive restriction enzyme (Xu et al., 2000) or standard GBS could be used for such analyses. In contrast to these methods, a huge set of different sequencing methods has been established to identify DNA methylation (Kurdyukov and Bullock, 2016)

Furthermore, large chromosomal modifications might be CNVs including duplication and deletions, which were especially identified in gene clusters (Boycheva et al., 2014). Exemplary, tandem and segmental duplications have been reported to be important for the distribution of genes involved in plant disease resistance (Leister et al., 1998; Leister, 2004; Himmelbach et al., 2010) and are of increased interest for breeding. Lu et al. (2015) developed a first approach to map the presence/absence of GBS tags genetically and incorporated these into genome-wide association scans in maize.

Here, we investigate whether it is possible to identify large chromosomal modifications from short read GBS data. Thereby, we do not utilize SNPs that are rare compared to the number of sequenced bases. Furthermore, such an approach would not use SNP calling, filtering and imputation saving runtime and avoiding artifacts that might possibly be introduced during these steps (Li, 2014). The idea of using coverage data of GBS for detecting large deletions in soybean after fast neutron mutagenesis has been presented by Lemay et al. (2019) at the Plant and Animal Genome Conference, San Diego 2018. Here, we present a bioinformatics approach that is able to detect large chromosomal modifications using standard GBS coverage data and demonstrate its applicability for barley and wheat. Furthermore, we demonstrate that this method is also able to detect large chromosomal modifications, e.g., introgression and CNV, if no pedigree or data from the parent plants is available.

### MATERIALS AND METHODS

### Plant Material and Genotyping-by-Sequencing

Barley and wheat collections were analyzed for this study. While GBS data of barley collections was publicly available, a diverse

winter wheat collection was compiled from European elite winter wheat cultivars and 209 genebank accessions from the Federal *ex situ* Genebank for Agricultural and Horticultural Plant Species of Germany, maintained at the Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) in Gatersleben, which hosts one of the largest barley and wheat germplasm collections in the world (Knüpffer, 2009). Genebank accessions were selected using the normalized rank products for plant height, flowering time and thousand grain weight yielding eight contrasting groups (Keilwagen et al., 2014).

Single seed descended (2× SSD) wheat plants were grown in soil under greenhouse conditions to three leave seedling stage. DNA extraction for GBS analysis followed previously published protocols (Milner et al., 2018). Genomic DNA was digested with *PstI* and *MspI* (New England Biolabs) and processed for GBS library construction essentially as described previously (Wendler et al., 2014). Barcoded samples were pooled in an equimolar manner and sequenced on the Illumina HiSeq2500 device, using a custom sequencing primer (Wendler et al., 2014) according to the manufacturer's instructions (Illumina).

### NGS Preprocessing and Coverage Analysis

Adapter and quality trimming of barley GBS raw reads was performed using Trim Galore (https://github.com/FelixKrueger/ TrimGalore, version 0.4.0, non-default parameters: — quality 30 — length 50). Subsequently, these trimmed reads were mapped to the barley reference genome (Mascher et al., 2017) using BWA mem (version 0.7.12) and default parameters (Li, 2013).

Obtained wheat GBS raw read pairs were adapter trimmed using cutadapt (version 1.9.1, non-default parameters: -m 30 -a AGATCGGAAGAGC) (Martin, 2011) and mapped to the bread wheat reference genome sequence (IWGSC, 2018) using BWA mem (version: 0.7.13, non-default parameters: -M -v 3) (Li, 2013). GNU parallel (version: 20150222) was applied to all of these steps for multi-threading purposes (Tange, 2011).

Using a custom Java script, the genome was divided in nonoverlapping windows of equal size *w =* 500,000bp. For each window, the number of high-quality mapped reads starting in this window was counted. More specifically, mapped reads were filtered for being the primary alignment, possessing a minimal mapping quality with at least PHRED score 20, and being not overclipped with at least 30 bases. Only reads passing these filters were counted. Subsequently, these counts were normalized by dividing them by the number of reads passing the filter (and multiplying with 1E6 for convenience) and denoted as normalized count *ci* of window *i*.

The normalized counts are highly variable along the chromosomes with higher values close to the telomers and smaller values close to the centromers. Hence, these counts need to be compared to some reference *r r r* = ( ) <sup>1</sup> *<sup>N</sup>* ,..., If a specific reference is given as, for instance, the counts of one of the ancestors *<sup>a</sup>* , it can be used directly ( ) *r a* <sup>=</sup> However, there are also cases where the ancestors or the normalized counts of the ancestors are unknown. In such cases, we compute the reference value *ri* as the median of normalized counts of all samples.1 Given a reference value *ri* , the ratio:

$$d\_\cdot = \log\_2\left(\frac{c\_i + \varepsilon}{r\_i + \varepsilon}\right)$$

was defined for *i* ∈ [1, *N*], where ε <sup>=</sup> 1 6*<sup>E</sup> <sup>N</sup>* is the number of expected reads per interval, which is used to dampen the noise in the measurement. Furthermore, a rolling average with window size 5 was used for each sample and each chromosome to denoise the signal and obtain a denoised profile *ps* for each sample *s*. Finally, we determine outliers in the profile *ps* of each sample *s* separately using the scores method of the R package outliers with type MAD and a probability of 99.99% (Komsta, 2011; R Core Team, 2018). Longer stretches of outliers were determined as a continuous sequence of at least 3 windows, whereby only one of two consecutive windows may not be an outlier.

### Ontology Term Enrichment

In order to study the functional composition of the genes affected by the CNVs, we compared the respective genomic regions to the coordinates of the high-confidence (HC) gene set of the latest barley (IBSC PGSB V1.0) and wheat (IWGSC V1.1) genome annotation releases. The resulting protein-coding HC gene loci where then mapped to the Gene and Plant Ontology annotations for barley and wheat (release v1.0; https://github.com/PGSB-HMGU/ontology\_annotataions). Enrichment of specific ontology terms among the given genes sets was tested using the "Parent-Child-Union" algorithm implemented in the Ontologizer software (Grossmann et al., 2007) using all annotated wheat genes as a references and applying multiple testing correction of p-values using the Benjamini-Hochberg method (p < 0.01).

The year and country of origin was manually curated using several online databases using variant identifiers, names, breeders and geographical information to match genotypes to database entries collecting the earliest registration dates with the German and European Plant Varieties Office or the earliest date of collection registered in the IPK and CIMMYT database. German genotypes were grouped into eight decades ranging from 1940 to 2020. We excluded the 1980s because our dataset only comprised one line from this period. All other periods are represented by at least 8 genotypes. To account for the in part large difference in number of genotypes, for ontology term enrichment analysis we discarded loci that were represented in less than 75% of the lines per decade. To compare functional enrichment across decades, we performed Ontologizer analyses comparing the unique loci for each decade to those showing CNVs in genotypes from the other decades.

### Validation

Genomic DNA was extracted from young leaf tissues according to Paris and Carter (2000). Wheat genotypes carrying the 1AL.1RS or 1BL.1RS wheat-rye translocation were detected based on rye insertion site based polymorphism (ISBP) markers ora3, ora16 and ora17 (Bartoš et al., 2008), a primer set tagging the rye ω-secalin gene (Shimizu et al., 1997) as well as the STS marker IA-294 (Mago et al., 2004). Rye genome specific repetitive sequence pSc20H (Ko et al., 2002) was used to derive a primer pair (forward primer: 5' ATT TCA TGC CGA AGG AGA TG 3', reverse primer: 5' ACT CGT TGT TCC CAA AGG TG 3') for a universal detection of rye chromatin in wheat. Amplification of the 612bp fragment was conducted in 35 cycles using an annealing temperature of 55°C. The wheat marker UMN19 (Liu et al., 2008) was used as a positive control.

### Rye SNP Calling and Genetic Distance

Preprocessed GBS reads of ten winter wheat genotypes, which carry at least the short arm of rye chromosome 1 (1RS), were mapped against a combined genome of rye (Bauer et al., 2017) and wheat (IWGSC, 2018) using BWA mem (version 0.7.15 r1140) and default parameters (Li, 2013). Subsequently, SNP calling was performed using samtools mpileup (version 1.2) and default parameter except output-tags DP and DPR (Li, 2011). Finally, SNPs were filtered to have minimum SNP quality 100, minimum GT quality 5, minimum 4 reads per sample as well as to be bi-allelic, to be called in all 10 genotypes, and to be located on rye chromosome 1R.

### Mapping of Functional Markers

Based on sequences of publicly available primer pairs (Liu et al., 2012), the location of functional markers on the wheat chromosome was determined using blastn (Altschul et al., 1990). The sequences were blasted against the wheat reference genome using blastn (version: 2.2.29+, non-default parameter: -evalue 100.0). The results were filtered to match the expected chromosome, primers pointing to each other, and about the expected length of the fragment to be amplified.

### RESULTS

### Analysis of Barley Introgression Lines

Since the 1990s, a limited number of *H. vulgare*/*H. bulbosum* introgression lines has been generated that harbor segments introgressed from *H. bulbosum* and show for a diverse set of desirable traits (cf. https://www.cwrdiversity.org/). Wendler et al. (2015) analyzed 146 of such introgression lines and compared the location of detected introgressed regions based on cytological analysis or SNPs derived from GBS data. Most of these introgression lines are the offspring of barley cultivars with available GBS data. In addition, three introgression lines were based on intermediate crosses where no GBS data was available. Here, these introgression lines were reanalyzed using GBS coverage analysis on publicly available GBS data.

First, introgression lines were analyzed where GBS data of the corresponding parental barley cultivar was available. Comparing the location of the longest stretch of outliers to the location of the introgression described by cytological investigation and SNP analysis of GBS data (Wendler et al., 2015), an overlap of 92% and 96% was observed, respectively (**Supplemental Table 1**, **Supplemental Data Sheet 1**). Scrutinizing the results, introgression lines were checked that had no overlap between the

<sup>1</sup> The median was chosen as it has the best possible breakdown point of 50%.

location of the longest stretch of outliers and the location of the introgression as determined by SNP analysis. For introgression lines 38 (ERR699829) and 107 (ERR699893) that are supposed to have an introgression on chromosome 2HS and 3HS, respectively, GBS coverage analysis did not detect any outlier. For introgression line 60 (ERR699850) that is supposed to have an introgression on chromosome 1HL and 7HL, the longest stretch of outliers was observed on chromosome 2HS that showed an increased GBS coverage. However, another stretch of outliers can also be detected on chromosome 1HL with decreased GBS coverage. For introgression line 66 (ERR699856) that is supposed to have an introgression on chromosome 2HL and 6HL, the longest stretches of outliers were observed on chromosome 3HL and 7HL. Additional individual outliers have been detected on chromosome 2HL and 6HL. For introgression line 118 (ERR699902) that is supposed to have an introgression on chromosome 1HL, the longest stretch of outliers was observed on chromosome 3HL that shows an increased GBS coverage. However, another stretch of outliers can also be detected on chromosome 1HL with decreased GBS coverage. Interestingly, the GBS analysis was able to detect the introgression on chromosome 6HS in introgression line 137 (ERR699920, **Figure 1A**, **Supplemental Data Sheet 1**), which was missed by SNP analysis, but detected by cytological analysis.

Summarizing these observations, we can state that GBS coverage analysis detected outliers that correlated well with known introgressions. About 89% of the detected outliers had negative denoised profile values *ps,i*. Hence, the coverage of the outliers was often decreased compared to the reference (**Supplemental Data Sheet 1**).

Second, all introgression lines were analyzed using the median of coverage values of the barley cultivars Emir, Golden Promise, Morex, and Vada as reference to test whether the median reference performs similar to an ancestor reference. When comparing the location of the longest stretch of outliers to the location of the introgression described by cytological investigation and SNP analysis of GBS data, an overlap of 84% and 87% was detected (**Supplemental Table 1**). Although the GBS reference coverage profile was not based on the corresponding parental barley cultivar, the detection of the location of introgressions by the longest stretch of outliers was still high. Considering also smaller outlier stretches, the overlap between the location of introgressions and the detected outliers can be increased again.

Finally, we investigated the reason for the decreased GBS coverage within introgressed regions. Exemplarily, the introgressed region in introgression line 2 (ERR699794) was investigated. Introgression line 2 is based on a cross between the barley cultivar Emir (ERR699939) and the *H. bulbosum* accession 2032 (ERR699945) and harbors an introgression on chromosome 2HL. GBS coverage analysis using Emir as reference identified the region 749.5–768.5 Mb on chromosome 2H (**Figure 1B**, **Supplemental Table 1**). In this region, 683 loci were identified that provided a starting point for the identified GBS fragments and had a combined coverage of at least 10 reads (**Figure 2**). Coverage for Emir was observed at 476 loci, but only 217 out of these 476 loci had coverage in the *H. bulbosum* accession 2,032. This observation indicated differences between the barley cultivar Emir and the *H. bulbosum* accession either on sequence or on methylation level. However, only 150 out of these 217 loci had coverage in the introgression line 2 indicating a putative additional methylation of the corresponding loci in the introgression line compared to the donor *H. bulbosum* accession 2,032.

### Analysis of Barley Genebank Collection

Recently, almost the complete barley collection of the Federal *ex situ* Genebank for Agricultural and Horticultural Plant Species of Germany was genotyped using GBS and SNP markers were detected and associated with several plant traits (Milner et al., 2018). Here, we reanalyzed this comprehensive data set (PRJEB23967 and PRJEB24563) looking for large chromosomal modifications.

After read mapping, filtering and counting reads per genomic window, 86 barley accessions with less than 100,000 GBS reads were discarded yielding 21,319 barley accessions for further analysis. The median of the reads passing the filter per sample was about 517,000, while for the introgression lines investigated above this value was more than twice as high with about 1,129,000 reads. For this reason, only large consecutive stretches of at least 30 outliers were investigated trying to identify potential chromosomal modifications and to avoid artificial outliers based on low coverage. These long outlier stretches correspond to at least 15 Mb in the genome.

TABLE 1 | Barley accessions with patterns of large chromosomal modifications from the Federal *ex situ* Genebank for Agricultural and Horticultural Plant Species of Germany.


*The first column contains the run ID from EMBL-EBI ENA, the second column contains the ID from the genebank at IPK, the third column contains the year of acquisition in the genebank, the fourth column contains the number of GBS reads that passed the filters, and the fifth column contains the longest stretch of outliers indicating the largest chromosomal modification detected. Finally, column six contains the type of modification, where 'low' indicates decreased coverage, 'high' indicates increased coverage and 'chromosome' indicates a modification over the complete chromosome.*

As pedigree information was not available for all of these barley accessions, the median was used as reference. Using the filter for long outlier stretches, seven accessions could be identified (**Table 1**, **Figure 3**). Three of these accessions, namely HOR 685, HOR 7537, and BCC 213, showed a region with decreased GBS coverage on chromosome 4HL and 5HS. Another three accessions, namely HOR 16589, HOR 16951, and BCC 722, showed a region with increased GBS coverage on chromosome 2HL, 1HS, and 7HS, respectively. In contrast, the accession HOR 19592 shows a pattern of increased coverage on both telomers of chromosome 4H.

In addition to individual coverage profiles, summarizing information for the complete collection were visualized in **Figure 4**. Considering all detected outliers, all chromosomes harbor a similar number of outlier loci (**Figure 4A**). Restricting the outlier loci only to those that have been detected in at least 3% barley genotypes, the picture slightly changes. Chromosomes 6H, 7H, and 5H harbor together more than 50% of such robust outlier loci, while chromosome 4H harbors the lowest number of such loci (**Figure 4B**). Looking at the spatial distribution in the complete genome, outliers are more frequent at the telomeres compared to centromers (**Figure 4C**). Furthermore, several thin, reddish lines were recognizable along the chromosomes indicating for robust, tight loci with changed coverage.

Highly consistent with the results of a previous study targeting CNVs in 14 barley genotypes using Comparative Genomic Hybridization arrays (Muñoz-Amatriaín et al., 2013), functional enrichment analysis using Gene and Plant Ontology annotations of outlier loci detected in at least 3% of the barley genotypes displayed enrichment of genes involved in cell death and immune response, particularly the defense and recognition of fungi and oomycetes using receptor like kinases (RLKs) with consistent major hotspots e.g. in the subtelomeric regions on 7H. Nevertheless, as depicted in **Figure 5** the increased resolution in terms of number of genotypes and ontology annotation now enables a more fine-grained picture of the gene functions modulated by CNVs in these lines comprising response to stimuli, cellular metabolic processes, and regulatory processes (**Supplemental Table 2**).

### Analysis of Elite Cultivars and Genebank Accessions of Wheat

A diverse winter wheat collection was single seed descended twice and genotyped using GBS. As pedigree information is partially unknown and ancestors of most wheat genotypes were not included in the collection, GBS coverage analysis was performed using the median as reference.

Statistics for the detected outliers were given in **Figure 6**. Considering all detected outliers, the B genome harbors nearly twice as much outlier loci compared to the A and D genome. In contrast, the D genome harbors only slightly more outlier loci than the A genome (**Figure 6A**). Looking at individual chromosomes, severe differences between the chromosomes can be detected. On the one hand side the chromosomes 1B, 2B, and 4B harbor approximately 25% of the outlier loci, while, on the other hand, ten chromosomes from the A and D genomes harbor less than 25% of the outlier loci (**Figure 6B**).

Restricting the outlier loci only to those that have been detected in at least 10% winter wheat genotypes (29 genotypes), the picture slightly changes. Again, the B genome harbors more than twice as much outlier loci than the A genome. The D genome harbors the least outlier loci with approximately 63% of the outlier loci compared to the A genome (**Figure 6C**). Also for individual chromosomes, a shift was observed. Chromosome 2B harbors the most outlier loci and chromosomes 6B, 5B and 7B are the runner-up. In contrast, chromosome 4B only harbors the seventh least outlier loci compared to the second most when considering all outlier loci (**Figure 6D**). For chromosome 4D, only two outliers could be detected that occur in at least 10% of genotypes, although more than 88% of the windows on chromosome 4D have been marked as outliers in at least one wheat genotype.

Looking at the spatial distribution in the complete genome, outliers are more frequent at the telomeres compared to centromers (**Figure 6E**). Some of the frequent outliers overlap with regions that contain genes of increased interest for breeding (Liu et al., 2012), as for instance, *Pm3* on 1AS, *Yr17* on 2AS, *Ppo-A1* on 2AL, *Wx-B1* on 4AL, *Lr19* on 7AL, *Psy-A1* on 7AL, *Glu-B3* on 1BS, *Psy-B1* on 7BL, *Glu-D1* on 1DL, and *Ppo-D1* on 2DL (**Supplemental Table 3**). For some of these genes, allelic variation was introduced by introgressions from crop wild relatives, e.g., *Yr17* from *Aegilops ventricosa* Tausch and *Lr19* from *Thinopyrum ponticum* (Liu et al., 2012). However, for several introgressions used in breeding programs, no molecular markers are publicly available. In addition, Thind et al. (2018) hypothesize that an interstitial introgression from *Ae. tauschii* is present in the wheat cultivar CH CampalaLr22a overlapping the large region with suspicious coverage on chromosome 2DL.

Looking at the size distribution of outlier stretches, about 66% of the winter wheat genotypes had consecutive outlier stretches with at least 50 outliers corresponding to at least 25 Mb (**Supplemental Table 4**). However, several other short stretches of outliers were found on all chromosomes.

Investigating individual winter wheat genotypes, we observed several striking patterns. For the Genebank accessions TRI 3810 (Salzmünder 14/44) and TRI 9323 (Mildress), almost all windows on the complete chromosome 1B were marked as outliers (**Figure 7A**). Further winter wheat genotypes comprising Anapolis, Brilliant, Matrix, Pamier, Winnetou, TRI 9367, TRI 10373, and TRI 11247 only had outliers on the short arm of chromosome 1B (**Figure 7B**). Other wheat genotypes, as for instance TRI 3364, had even smaller stretches of outliers on chromosome arm 1BS (**Figure 7C**).

Similar to chromosome 1B, other chromosomes also exhibited large patterns of decreased coverage including chromosome 1A for Memory and TRI 6874; chromosome 1D for Kometus; chromosome 3B for TRI 994; chromosome 3D for TRI 10166; chromosome 4B for Smaragd; chromosome 4D for TRI 7040; chromosome 5B for TRI 12027; chromosome 6B for TRI 5164 and TRI 6775; and chromosome 7A for Brilliant and TRI 1005 (**Supplemental Data Sheet 2**). In contrast, large patterns of increased coverage were observed, for instance, on chromosome 1D for TRI 6868, and on chromosome 2D for TRI 5042 and TRI 7716 (**Supplemental Data Sheet 3**). The large pattern detected in the wheat line TRI 7040

the y-axis depicts normalized coverage. Each dot visualizes the denoised coverage value of a non-overlapping 500 kb window, while the dashed line depicts the expectation. Dots are depicted in red if they are marked as outliers indicating large chromosomal modifications.

explains the high percentage of windows on chromosome 4D that are marked as outlier in at least one wheat line.

Based on literature, some of these patterns can be explained. For instance, Rabinovich (1998) reported that TRI 3810 (Salzmünder 14/44) and TRI 9323 (Mildress) have a substitution of wheat chromosome 1B by the rye chromosome 1R. For several wheat lines, including TRI 9367 (Skorospelka 35) (Gupta and Shepherd, 1992) and TRI 10373 (Benno) (Luo et al., 2008), the rye-wheat translocation 1RS.1BL was reported. Using ryespecific primers, rye DNA was detected in a variable number of genotypes. Rye DNA was indicated for 26, 10, 10, and 10 genotypes using the markers OP20H, SCM9, ora003, and ora007, respectively (**Supplemental Table 4**, **Supplemental Figure 1**). The combination of markers indicated an introgression of the short arm of rye chromosome 1R for ten genotypes comprising Anapolis, Brilliant, Memory, Pamier, Winnetou, TRI 3810, TRI 9323, TRI 9367, TRI 10373, and TRI 11247. These data in combination with the coverage profiles allows to identify the locus of the introgression. A substitution of wheat chromosome 1B by the rye chromosome 1R was indicated for TRI 3810 and TRI 9323 and a 1RS.1AL translocation was indicated for Memory, while for the other seven genotypes a 1RS.1BL translocation was indicated.

Read mapping against a combined wheat and rye reference was performed using the wheat genotypes carrying 1RS. Subsequently, SNP calling was performed on chromosome 1R obtaining 177 bi-allelic SNPs that have been called in all of these genotypes (**Supplemental Table 5**). SNPs were classified heterozygous ranging from about 15% for Memory to about 30% for all other genotypes (~26% for Anapolis to ~32% for TRI 9323 and TRI 11247). A position in the high-density genetic map of rye (Bauer et al., 2017) could be determined for 133 SNPs (75.2%), while the map position for 44 SNPs (24.8%) remains unknown. In total, 125 SNPs map to the short arm of chromosome 1R (**Supplemental Table 4**). Focusing only on SNPs that have been called homozygous for all ten genotypes yielded only 95 SNPs (75 SNPs below 70cm, 18 SNPs unknown, and only 2 SNPs above 70cm). Based on these 95 SNPs only two haplotypes were identified: one for Memory and the other for the remaining nine genotypes.

The genotypes contained in our dataset span almost eight decades of German wheat breeding ranging from 1940 to the present day. This provides the opportunity to assess variation of the molecular targets in German wheat breeding over the decades. Indeed, ontology term enrichment analyses of the gene sets varying in coverage across the different decades reveal specific changes in the biological processes and anatomical structures of these wheat lines (**Supplemental Figures 2** and **3**; **Supplemental Table 6**). Overall, as suggested by the substantially higher number of significantly enriched terms in modern elite lines, current breeding efforts seem to target a broad spectrum of biological processes and structures. This ranges from improvement of pathogen/herbivore resistance, climate change effects including flooding and hypoxia, modulation of sulphur compounds for storage protein quality, increased nutrient and water uptake by improvement of the root system and overall vasculature, robustness of the stele in all plant body parts to avoid lodging to overall fine tuning of developmental and reproductive processes to increase yields in specific climate conditions. The few cases with an enrichment of GO and PO terms from older decades provide striking examples for the accuracy and the potential of our method. We observe an enrichment of genes related to trichomes in genotypes from the 1940s and 1970s. This illustrates the hidden potential in these older genotypes, because trichome length and density are an important resistance factor for the cereal leaf beetle whose larvae and adults feed on leaves and also increase the probability of fungal infections (Hoxie et al., 1975; Konyspaevna, 2012). Plants originating from the 1970s surprisingly show an enrichment of oligopeptide transport. The literature record provides a compelling explanation for this observation. Oligopeptide transporters like yellow-stripe-like transporters (YSL) are important for micronutrient uptake including Fe and Zn (Kumar et al., 2019).

Strikingly, previous studies reported substantially higher mineral micronutrient grain contents especially for Zn and Fe in wheat genotypes from that time period also including several German lines (Zhao et al., 2009).

### DISCUSSION

In order to better understand the genomic composition of barley and wheat genotypes, we aim at the detection of large chromosomal modifications using GBS data and a bioinformatics pipeline. Instead of using SNPs, we investigate the sequence coverage. Differences in the coverage of GBS data might be attributed to (a) missing or duplicated genomic regions, (b) mutations in the recognition site of the restriction enzyme, or (c) changes in the methylation of the recognition site of the methylation-sensitive restriction enzyme.

Increased coverage normally indicates more reads than expected and, hence, an addition or duplication, while decreased coverage normally indicates less reads than expected and, hence, a deletion or substitution. If no reference plant is available, at each locus the median at this locus within the collection can be used. However, this renders the interpretations of coverage profiles a little bit more difficult.

We perform three case studies analyzing barley and wheat collections and find several interesting patterns of increased

genotype with an outlier and red indicates many wheat genotypes with an outlier at this locus. Triangles indicate genes with interest for breeding. Black triangles indicate genes that are located in regions with many outliers within the collection, while gray triangles indicate genes in regions with a low number of outliers.

and decreased sequence coverage. These patterns indicate large modifications of chromosomal regions that might be larger deletions, duplications, and substitutions, as well as modification of DNA methylation.

Firstly, we demonstrate that GBS sequence coverage profiles can be used as an alternative method to detect large chromosomal modifications in barley (*H. vulgare/H. bulbosum*) introgression lines using the GBS data of the known parent as reference. Subsequently, we show that GBS data of the parental lines are dispensable for the identification of the introgressed regions. This observation is of central importance as it allows to determine the size and chromosomal location of introgressed regions if pedigree data or GBS data of ancestors is either not available or not accessible. In both cases, the results agreed very well with those reported by Wendler et al. (2015).

Secondly, we analyze the barley genebank collection from IPK Gatersleben looking for large chromosomal modifications. We also identify outliers in this collection. The amount of large outlier stretches is low, which might be a result of very low efficiency of natural inter-species hybridization for diploid plants. Nevertheless, we identify barley accessions with striking patterns of decreased or increased GBS sequence coverage. Three of these accessions show a clear pattern of decreased GBS sequence coverage which might be attributed to an introgression or a deletion. Since these accessions have been introduced in the genebank in 1942, 1976, and 1983, this might indicate natural inter-species hybridization or long deletions in barley. Three other accessions show a clear pattern of increased GBS sequence coverage which might be attributed to duplications. However, we also identify several smaller outlier stretches that are predominantly located in telomeric regions. Based on outliers that occur in at least 3% of the accessions, we identified several significantly enriched GO terms of genes located in these regions. The distribution of the observed patterns and the enriched GO terms match the findings for CNV in barley (Muñoz-Amatriaín et al., 2013).

Thirdly, we investigate GBS coverage profiles of wheat cultivars and genebank accessions. We find outliers for all wheat genotypes and large stretches of outliers for about 66% of the wheat genotypes. Some of these outliers can be associated with rye introgressions reported in literature or verified by wet-lab experiments. Based on SNP data, we could identify only two haplotypes of 1RS introgressions in 10 wheat genotypes carrying 1RS. Mainly four rye sources have been used to incorporate rye chromatin in

wheat, deployed as (1B)1R substitution or 1BL.1RS and 1AL.1RS translocation lines (Crespo-Herrera et al., 2017). The most widely exploited source carries the 1RS.1BL translocation from Petkus rye, while the 1RS.1AL translocation has an independent origin (Schlegel and Korzun, 1997). The SNPs identified in the present study represent the currently most comprehensive description of both translocations at the molecular level and extends the molecular toolbox available for genetic analyses of this important alien introgression in wheat. Furthermore, the 1RS introgression serves as a positive control for our GBS-based approach to detect alien introgressions in wheat. However, as potential donors are unknown, it is hard to verify all detected patterns. Interestingly, we find only two outliers on chromosome 4D that occur in at least 10% wheat genotypes, which is consistent with reports of low diversity in wheat subgenome D and especially on chromosome 4D (Akhunov et al., 2010).

Summarizing these observations, we can state that GBS sequence coverage profiles are a new method to determine large chromosomal modifications in the major cereals barley and wheat. These modifications can be introgressions, but also other processes could be explanations for these patterns as for instance CNV. Besides SNPs, introgressions and CNVs are very important for breeding and can now be analyzed without additional wet-lab experiments if SNPs were detected using GBS. In principle, other sequencing protocols like exome capture (Mascher et al., 2013) or adaption of single-primer enrichment technology (Scaglione et al., 2019) might possibly be used as an alternative to GBS for coverage analysis. In this study, we analyze barley and wheat, but the method might be applicable for other species as well if a reference genome is available.

In summary, the GBS coverage analysis can be used to infer introgressed regions and to identify genotypes with suspicious coverage patterns in wheat and barley. Coverage analysis has several advantages compared to other detection methods. Firstly, coverage analysis is a simple way for detecting chromosomal modifications. Secondly, it allows to detect a variety of chromosomal modifications including introgressions and CNVs on a genome-wide scale. Thirdly, in case of introgressions no information about the donor is needed to provide probes for hybridization. Hence, the method allows to detect a wide range of introgressions from different crop wild relatives using a single wet-lab experiment. Fourthly, the resolution that was used in this study was 500kb which is much better than for many other detection methods. Fifthly, since sequencing data are generated, these data can also be used to identify SNPs and derive markers, e.g. for marker-assisted selection. Hence, GBS can provide information about SNPs, introgressions and CNVs with a single, simple and cheap wet-lab experiment. Although, analysis of missing values is also possible for SNP arrays (data not shown), much more information can be obtained from coverage analysis rendering GBS a more valuable resource for genotyping compared to SNP arrays. For this reason, coverage analysis on GBS data might be an important additional argument in the discussion about which marker system to be used (Darrier et al., 2019). Sixthly, based on SNP data the method allows to distinguish different donors from the same species (cf. wheat genotypes carrying 1RS). Finally, decreased GBS coverage could possibly lead to a shortage of detected polymorphisms in introgressed regions which hampers the detection of such regions based on SNPs compared to coverage analysis (e.g., ERR699920, **Figure 1A**).

However, the method might be limited in detecting very small modifications due to the limited number of GBS data and the applied window approach. Applying the idea to the genomic position of individual reads might increase the resolution (Lemay et al., 2019), but on the other hand could hinder the detection of larger chromosomal modifications. In addition, the phylogenetic distance between the wild donor and the crop plant might probably influence the detection rate of introgressed regions, where DNA from a closely related donor might be harder to detect. Furthermore, the method will not be able to detect modifications in regions that are not represented in the reference genome sequence as well as inversions or reciprocal translocations, as for instance the reciprocal translocation 5B:7B in the wheat cultivar Cappelle-Desprez (Badaeva et al., 2007).

Besides detecting regions of suspicious coverage, the method might be used for several downstream analyses. The proposed method is just a fast screening approach for regions with unexpected coverage, while association methods like genome-wide association studies (GWAS) try to identify genetic markers that explain the phenotypic data that was often collected in time-consuming experiments. Hence, combining GBS coverage data with phenotypic observations could potentially associate regions of unexpected coverage with plant traits. In addition, the method might be used to identify genomic regions under selection. Furthermore, the method could be applied to verify or falsify duplicates in genebank collections. Additionally, a shortage of detected polymorphisms in introgressed regions might lead to an underestimation of the genetic distance between different genotypes. Hence, there might be a need for the development of alternative genetic distance measures.

### AUTHOR CONTRIBUTIONS

JK developed the idea. AH performed GBS of the winter wheat collection. SB performed read mapping of the winter wheat collection. TB performed read mapping of the barley collections and the combined read mapping against wheat and rye for some wheat genotypes. JK implemented the software and performed the computational analysis. DL performed ontology term enrichment. JK, HL, NS, EB, DL, BK, BH, and DP discussed the results. BH tested wheat genotypes with rye-specific markers. JK wrote the manuscript. All authors discussed and approved the final manuscript.

### FUNDING

HL and GBS data of the winter wheat collection were supported by the Federal Ministry of Food and Agriculture within the GenDiv project (grant no. 2814603813).

### ACKNOWLEDGMENTS

We are grateful to Marc-André Lemay and François Belzile for openly sharing the idea of using GBS data for coverage analysis to determine large deletions. We thank Anne Fiebig, Heike Harms, Manuela Knauft, Susanne König, and Mary Ziems for technical assistance. We thank Erhard Ebmeyer from KWS Lochow GmBH, Johannes Schacht and Anne Starke from Limagrain GmbH, Ludwig Ramgraber and Jens Weyen from Saatzucht Josef Breun GmbH & Co. KG, Carsten Reinbrecht and Stefan Streng from Saatzucht Streng-Engelen GmbH & Co. KG, and Tanja Gerjets from German Federation for Plant innovation (GFPi, proWeizen) for supporting the wheat project GenDiv. We also thank Ebrahim Kazman, Hubert Kempf, and Martin Mascher for fruitful discussions. Finally, we thank Maren Fischer, Christian Kohl and Peter Werner for comments on the manuscript.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2019.01133/ full#supplementary-material

SUPPLEMENTAL TABLE 1 | Detected outliers of *H. bulbosum* introgression lines.

SUPPLEMENTAL TABLE 2 | Significantly enriched GO terms of loci detected in at least 3% of barley genotypes.

SUPPLEMENTAL TABLE 3 | Position of functional markers in wheat determined by blast.

SUPPLEMENTAL TABLE 4 | Detected outliers of wheat genotypes.

SUPPLEMENTAL TABLE 5 | SNPs and representative reads on 1R for 10 wheat genotypes carrying 1RS.

SUPPLEMENTAL TABLE 6 | Significantly enriched GO terms detected in wheat genotypes per decade

SUPPLEMENTAL DATA SHEET 1 | Coverage profiles of *H. bulbosum* introgression lines.

SUPPLEMENTAL DATA SHEET 2 | Coverage profiles of large chromosomal modifications with decreased coverage form selected wheat genotypes.

SUPPLEMENTAL DATA SHEET 3 | Coverage profiles of large chromosomal modifications with increased coverage form selected wheat genotypes.

SUPPLEMENTAL FIGURE 1 | Gel image for 30 winter wheat genotypes using 4 rye markers.

SUPPLEMENTAL FIGURE 2 | Significantly enriched (FDR<0.01 in at least one decade set) GO biological process terms for all studied decades. Subplots distinguish whether the FDR values were < 0.1 (FALSE|TRUE).

SUPPLEMENTAL FIGURE 3 | Significantly enriched (FDR<0.01 in at least one decade set) GO biological process terms for all studied decades. Subplots distinguish whether the FDR values were < 0.1 (FALSE|TRUE).

REFERENCES


Mago, R., Spielmeyer, W., Lawrence, G. J., Ellis, J. G., and Pryor, A. J. (2004). Resistance genes for rye stem rust (SrR) and barley powdery mildew (Mla) are located in syntenic regions on short arm of chromosome. *Genome* 47, 112–121. doi: 10.1139/g03-096

Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. *EMBnet.journal* 17, 10–12. doi: 10.14806/ej.17.1.200


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Keilwagen, Lehnert, Berner, Beier, Scholz, Himmelbach, Stein, Badaeva, Lang, Kilian, Hackauf and Perovic. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Reference Genome Anchoring of High-Density Markers for Association Mapping and Genomic Prediction in European Winter Wheat

*Olufunmilayo Ladejobi1,2, Ian J. Mackay1,3, Jesse Poland4, Sebastien Praud5, Julian M. Hibberd2 and Alison R. Bentley1\**

1 The John Bingham Laboratory, NIAB, Cambridge, United Kingdom, 2 Department of Plant Sciences, The University of Cambridge, Cambridge, United Kingdom, 3 IMplant Consultancy Ltd., Chelmsford, United Kingdom, 4 Wheat Genetics Resource Center, Department of Plant Pathology, Kansas State University, Manhattan, KS, United States, 5 Biogemma, Site de La Garenne, Chappes, France

In this study, we anchored genotyping-by-sequencing data to the International Wheat Genome Sequencing Consortium Reference Sequence v1.0 assembly to generate over 40,000 high quality single nucleotide polymorphism markers on a panel of 376 elite European winter wheat varieties released between 1946 and 2007. We compared association mapping and genomic prediction accuracy for a range of productivity traits with previous results based on lower density dominant DArT markers. The results demonstrate that the availability of RefSeq v1.0 supports higher precision trait mapping and provides the density of markers required to obtain accurate predictions of traits controlled by multiple small effect loci, including grain yield.

Keywords: mapping, quantitative traits, trait dissection, next-generation sequencing, genomic selection

### INTRODUCTION

Historically, wheat breeding has focused on phenotypic selection for final yield potential combined with morphological and disease resistance traits (Cavanagh et al., 2013). The advent of genetic and genomic tools has largely supported marker-assisted selection for major genes in segregating generations. There is additional potential for the introgression of favorable genetic regions controlling variation in agronomically significant quantitative trait loci (QTL) through the routine application of genomic selection (GS) schemes that are based on the combined merit of genome-wide markers (Meuwissen et al., 2001; Stamp and Visser, 2012).

Advances in genomic technologies combined with computationally efficient statistical models present new opportunities for molecular crop breeding. Selection based on phenotype is complex, time-consuming, and still costly; thereby necessitating the adoption of molecular breeding systems. For crop geneticists and plant breeders, the adoption and applicability of genotyping-by-sequencing (GBS) has been recently demonstrated for a wide range of crops. This includes the detection of QTL controlling agronomic traits in rice and soybean (Begum et al., 2015; Sonah et al., 2015) and the detection of introgressions in cotton, *Brassica,* and sorghum (Kim et al., 2016).

GBS is an attractive alternative to array-based methods for generating high volume genome-wide single nucleotide polymorphisms (SNPs) for genome-wide association studies (GWAS) and GS. It is a fast, robust, and high-throughput method applicable across species in which genotyping and polymorphism discovery occur simultaneously, thereby avoiding the upfront effort of discovering,

#### Edited by:

Hikmet Budak, Montana BioAg Inc, United States

#### Reviewed by:

Delfina Barabaschi, Genomics Research Centre, Italy Harsh Raman, New South Wales Department of Primary Industries, Australia

#### \*Correspondence:

Alison R. Bentley alison.bentley@niab.com

#### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 02 November 2018 Accepted: 12 September 2019 Published: 08 November 2019

#### Citation:

Ladejobi O, Mackay IJ, Poland J, Praud S, Hibberd JM and Bentley AR (2019) Reference Genome Anchoring of High-Density Markers for Association Mapping and Genomic Prediction in European Winter Wheat. Front. Plant Sci. 10:1278. doi: 10.3389/fpls.2019.01278

1 **225** Ladejobi et al. Mapping and Prediction in Wheat

screening, and characterizing polymorphisms that is generating such ascertainment bias (Poland and Rife, 2012). Initially developed by Elshire et al. (2011), the technique was modified by Poland et al. (2012b) to produce a two enzyme version suitable for polyploid species with large genomes. This uses a combination of methylation sensitive restriction enzymes, *PstI* and *MspI*, cutting at rare and common restriction sites, respectively followed by next-generation sequencing. An accompanying bioinformatics pipeline, Tassel-GBS (Glaubitz et al., 2014), is in place for calling SNP variants from the resulting GBS sequences. GBS has recently been employed in wheat for linkage mapping and genomic prediction studies (Poland et al., 2012a; Poland et al., 2012b; He et al., 2014) and the availability of a high-quality reference RefSeq v1.0 genome assembly (International Wheat Genome Sequencing Consortium, 2018) should enhance the efficiency and quality of GBS data for downstream analysis (Kim et al., 2016).

GWAS combines high-density genome wide marker information (such as those derived from GBS) with high levels of genetic diversity in panels of individuals in order to map QTL. In a breeding context, it is used to detect genomic regions controlling complex quantitative traits and identifying alleles (and associated markers) for exploitation in variety improvement. GWAS has been used to detect marker-trait associations for several traits in wheat including grain protein content, thousand kernel weight and specific weight (Reif et al., 2011), agronomic traits (Bentley et al., 2014; Mora et al., 2015), and resistance to Fusarium Head Blight (Arruda et al., 2016).

Despite the power of GWAS to detect significant associations, many agronomically important traits under selection are polygenic, meaning these traits are influenced by many common SNPs, each with small individual effect, and remain recalcitrant to conventional marker-assisted selection. GS was proposed to address this complexity (Meuwissen et al., 2001) because it omits the significance testing used in GWAS, modelling the effect of all genotyped markers simultaneously (Meuwissen et al., 2001). This avoids the "Winners' Curse" bias (Beavis, 1994) caused by selection of a subset of markers, and also improves the accuracy of selection. By including genome-wide marker data in a model to predict complex traits, the accuracy of selection is increased through greater capture of low heritability traits. This could accelerate genetic gain through a shortening of the breeding cycle, particularly for traits that are expensive to phenotype, are measured late in the growing season or require large volumes of seed to assess.

Several studies have investigated the accuracy of prediction using real and simulated data. The central considerations in these studies have been the predictive ability of available statistical models and the composition and size of the training population (Heslot et al., 2012; Daetwyler et al., 2013; de los Campos et al., 2013). Using eight datasets from four plant species including wheat and barley, Heslot et al. (2012) tested 11 GS models and found predictive abilities to be equivalent for many of the methods but with differences in computational times. Ridge regression best linear unbiased prediction (RR-BLUP), is computationally efficient (Endelman and Jannink, 2012; Lipka et al., 2015) and is used in the present study to assess predictive variation between genetic marker sets.

In this study, a previously described panel of 376 elite winter wheat varieties released or commercialized in the UK, France, and Germany between 1946 and 2007 (Bentley et al., 2014) were genotyped with GBS to provide dense genome-wide marker coverage. By re-genotyping the panel, we aimed to compare GWAS and GS performance across low- and high-marker density genotyping platforms and demonstrate the use and applicability of GBS given the recent release of a high-quality International Wheat Genome Sequencing Consortium (IWGSC) RefSeq v1.0 genome assembly (International Wheat Genome Sequencing Consortium, 2018). We tested GBS as an effective means to identify large numbers of SNPs to detect broadly relevant QTL controlling key traits with high precision and to demonstrate the usefulness of GBS for GS and its potential for breeding applications.

## MATERIALS AND METHODS

### Plant Material and Phenotyping

The previously described TriticeaeGenome panel consisting of 376 elite winter wheat varieties was used in this study (Bentley et al., 2014). The panel was evaluated for a range of agronomic traits in replicated European trials in France (FRA), Germany (DEU), and the UK (GBR) in 2010 and 2011 as described in Bentley et al. (2014). Flowering time (FT), grain yield (GY), and plant height (PH) were evaluated across all trials while nine additional traits including presence/absence of awns (Awns), winter kill (Wkill), maturity (MAT), grain protein content (Gpt), ears/m2 (Ears), lodging resistance (LR), grain specific weight (GSW), tiller number (TN), and thousand grain weight (TGW) were scored in single European locations as described in Bentley et al. (2014). All trait data are summarized in **Supplementary Table S1** and available from Figshare DOI: 10.6084/m9.figshare.7350284. For each trait, best linear unbiased estimates (BLUEs) were generated in GenStat (Payne, 2009) for variety performance at each site and over all sites for use in association analysis and genomic prediction. Marker-trait association for FT and GY was calculated on BLUEs for each site per year and overall values from all sites. Association for PH and LR was calculated from overall site BLUEs.

### Genotyping, Variant SNP Calling, and Imputation

Genomic DNA was isolated from 2-week-old seedlings of each line using a modified Tanksley extraction protocol (Fulton et al., 1995). GBS was conducted as described by Poland et al. (2012b). To ensure adequate sequencing coverage and enhance accuracy each line was replicated four times with each replicate identified by a unique barcode. GBS libraries were sequenced in 96-plex across four flow cell lanes in Illumina HiSeq. Fastq sequence files were processed in the TASSEL GBS pipeline version 5.2.31 (Glaubitz et al., 2014). Reads were trimmed to 64 base pairs and filtered based on sequence quality score to obtain only good quality reads with a barcode sequence, five nucleotides of *PstI* restriction site fragment and no unreadable bases (N) in between. Reads were aligned to the IWGSC RefSeq v1.0 reference genome (International Wheat Genome Sequencing Consortium, 2018) using Bowtie2 (Langmead and Salzberg, 2012). SNP sites were filtered to remove loci with extremely low coverage or high levels of missing data and heterozygosity. Filtering also removed SNPs with minor allele frequencies (MAFs) below 5%. Individuals with more than 50% data missing were excluded from downstream analysis. The filtered GBS data in Hapmap format is available from Figshare DOI: 10.6084/m9.figshare.7350284. Missing SNPs were imputed using the LD-kNNi method implemented in TASSEL (Money et al., 2015) with the following parameters: number of sites in LD = 200; maximum number of nearest neighbors used in imputation = 50; 10 imputation iterations.

### Linkage Disequilibrium

Linkage disequilibrium (LD) was evaluated for average decay for each genome and between pairs of SNPs per chromosome. For the overall genome LD decay pattern, only SNP sites with MAFs of at least 0.1 were included. Pairwise LD was calculated as the squared correlation of allele frequency *r²* between SNP loci (Weir, 1996). The P-values of LD between any two loci were determined by a two-sided Fisher's exact test. To summarize the pattern of decay of LD with distance, a curve of decay of *r²* with distance in base pairs was estimated by nonlinear least squares (Remington et al., 2001; Marroni et al., 2011). LD was estimated in TASSEL (Glaubitz et al., 2014). LD decay plots of *r²* values with distance in base pairs were plotted and the LD decay curve fitted in R (R Core Team, 2016).

## Population Structure

A subset of 7,865 uncorrelated SNPs derived from thinning the full set of SNPs based on physical distance (minimum distance 100,000 bp) was used for evaluation of population structure. Principal coordinate analysis (PCoA) was conducted in the R package "ade4" (Dray and Dufour, 2007). The first two principal coordinates accounting for the largest proportion of variation were used to visualize patterns of population structure within the panel. Population structure was also evaluated using the Bayesian clustering approach implemented in the software STRUCTURE 2.3.4 (Pritchard et al., 2000). A burn-in of 100,000 iterations followed by a Markov Chain Monte Carlo (MCMC) of 100,000 iterations was executed to estimate the number of subpopulations. An admixture model was applied for two to ten putative populations (K) and six independent runs were conducted for each K. The optimal K value was inferred based on the rate of change in log probability of data between successive K values using the *ad hoc* statistic, DeltaK (Evanno et al., 2005). The program CLUMPP was used to assign results from separate STRUCTURE runs to common populations (Jakobsson and Rosenberg, 2007). The panel had been previously genotyped with 2,012 polymorphic dominant Diversity Array Technology (DArT; www.diversityarrays.com) array markers and 1,804 markers retained for analyses. In this study, a subset of 1,117 unlinked DArT markers were reselected and used to re-estimate PCoA based on DArT.

### Association Mapping

Association was estimated by mixed linear modeling (MLM) implemented using the efficient mixed model association method (EMMA; Kang et al., 2008) in the Genomic Association and Prediction Integrated Tool (GAPIT; Lipka et al., 2012). To improve statistical power and exclude bias due to relatedness, the PCA + K mixed model (Yu et al., 2006; Zhang et al., 2010) was used. Within this model, population structure and relatedness were accounted for by jointly incorporating PCA as fixed effects and a kinship matrix as a random effect, respectively. The kinship matrix was estimated by the centered identity-by-state method derived by Endelman and Jannink (2012) in TASSEL. Bayesian information criterion (BIC) was used to determine the optimal number of principal components in the mixed model for estimating marker-trait association. A Bonferroni correction threshold for multiple testing was calculated at an experimental P-value = 0.01. The amount of phenotypic variation controlled by identified QTL was estimated as the difference in residual variance between models with and without the marker effect. Significance of associations was tested using a false discovery rate (FDR) P-value at a cutoff of 0.05 according to Benjamini and Hochberg (1995). Marker-trait association for FT and GY was calculated on BLUEs for each site per year and overall values from all sites. Association for PH and LR was calculated from overall site BLUEs. Association mapping results from the present study using GBS markers were compared to previous results on the panel using DArT markers (Bentley et al., 2014).

### Genomic Prediction

RR-BLUP as implemented in the R package "rrBLUP" (Endelman, 2011) was used to predict genomic estimated breeding values (GEBVs). The predictive ability of GBS and DArT markers were compared across the panel for all 12 traits using tenfold crossvalidation. The panel was also split by country of origin (FRA, DEU, and GBR) and prediction accuracy assessed within each group by tenfold cross-validation. Training populations were assembled separately from FRA, DEU, and GBR with 192, 82, and 70 varieties, respectively and with each used to predict the phenotypes of varieties from the two remaining countries combined and separately. In all cases prediction accuracy was evaluated as the average Pearson's correlation between the predicted GEBVs and the true phenotype value across 10 runs.

## RESULTS

### Genotyping

Approximately 1.4 million good quality reads (defined as bar-coded reads of 64 nucleotides in length with high quality scores) were generated from alignment to IWGSC RefSeq v1.0 (International Wheat Genome Sequencing Consortium, 2018). There was an overall alignment rate of 91.28% and of these 20.54% aligned to unique positions and a total of 200,712 SNP sites were identified from the alignment. Sequencing coverage per allele per line was variable among all lines dependent on the quality of genomic DNA. However, because the lines were sequenced in replicates, the effect of low coverage was minimal. Filtering on low coverage eliminated 28% of the data. The data were further filtered to remove lines with >30% SNPs missing, and SNP sites with >20% data missing. Highly heterozygous SNP sites were also filtered to avoid confounding effects from homoeologous SNPs. After filtering, a total of 42,795 SNPs and 350 individuals were retained for subsequent analyses. The proportion of SNP markers across the three wheat genomes was highest on the B genome (52%) followed by the A genome (32%) and the D genome (10%), which was lowest, as expected. Chromosome 1B had the highest number of SNPs (4,878) while 4D had the least (240). Unmapped SNPs comprised 2% of the SNP dataset (**Supplementary Table S2**). MAF were slightly skewed in favor of lower values. MAFs for 22.5% of SNPs were within the range 5%–10% (**Supplementary Figure S1**).

### Linkage Disequilibrium

LD was estimated between SNP loci on each chromosome as the squared correlation of allele frequency *r²*. A nonlinear least squares curve was fitted to estimate the distance in mega base pairs (Mbp) within which LD decayed to 0.2 on each chromosome (summarized in **Supplementary Table S2**). Overall, LD decayed with increasing physical map distance on all chromosomes and on all genomes. However, on all chromosomes, some marker pairs separated by long distances were observed to be in high LD (r² = 1). Over the whole genome, LD decayed at an average distance of 4.98 Mbp. The slowest rate of LD decay was observed for the D genome followed by the B and A genomes with average LD decay distance estimates of 6.4, 4.5, and 4 Mbp, respectively. The average trend of LD decay rate estimated across each genome revealed that the percentage of SNP loci pairs with *r²* values above 0.2 on the A, B, and D genomes were 28.61%, 25.55%, and 19.37%, respectively. On the D genome, LD decay distance ranged from 2.5 Mbp (4D, 7D) to 10 Mbp (1D, 2D, 3D). On the B genome, the highest LD was observed on chromosome 2B at 10 Mbp and the lowest on 1B (1.0 Mbp). On the A genome, LD decay distance was 5 Mbp on chromosomes 4A, 5A, 6A, 7A. LD decay plots for 1A, 1B, and 1D are shown in **Figure 1**. LD decay plots for all other chromosomes are shown in **Supplementary Figure S2**.

## Population Structure

Principal coordinate analysis was used to estimate and visualize population structure within the panel based on a subset of 7,865 evenly distributed GBS markers compared to the 1,117 dominant DArT markers that were previously reported. The proportion of genetic variation explained by the first two PCs was higher for GBS than for DArT markers, cumulatively explaining 14.4% and 8.9% of variation, respectively (**Figure 2**). The first five GBS PCs cumulatively explained 23.2% of variation while the equivalent DArT PCs explained 16.9%. For both GBS and DArT markers, PCoA did not clearly discriminate between lines from different countries of origin (**Figure 2**) although some basic grouping by origin was observed with the DEU and GBR lines clearly separated and those of French origin overlaying the other two. Structure analysis based on GBS revealed that the panel could be split into K = 4 groups as inferred from the analysis of the *ad hoc* ΔK statistic (Evanno et al., 2005; **Supplementary Figure S3**). Only 92 of the varieties could not be placed into a single distinctive group. Similar to the results of PCoA, the groups were not discriminated by country of origin.

## Association Mapping Using GBS

GBS association mapping was conducted for GY, three yieldrelated (TGW, GSW, and ears/m²), seven morphological (FT, PH, Awns, LR, Wkill, TN, and MAT) and one quality (Gpt) trait using 42,795 SNPs and 350 individuals (**Supplementary Table S1**). The mixed model method detected a total of 63 loci (comprising 638 SNPs) with significant marker-trait associations for eight traits. Of the total number of significant SNPs, 77 were significant at the experiment-wide Bonferroni threshold (–log₁₀ p-value = 6.63) and the remaining 561 SNPs were declared significant at the less stringent FDR p-value ≤ 0.05. The total number of significant SNPs detected for each trait is shown in **Table 1**. No significant associations were detected

FIGURE 2 | Principal coordinate analysis based on the first two principal coordinates using (A) Genotyping-by-sequencing (GBS) markers and (B) Diversity Array Technology (DArT) markers. Each point represents a line in the variety collection colored by its country of origin (DEU: blue; FRA: red; GBR: green).

TABLE 1 | | Summary of quantitative trait loci (QTL) detected with significant marker-trait associations for across site and site-specific analysis for flowering time (FT) and grain yield (GY).


\*QTL detected with the most significantly associated SNPs with PH and FT included in mixed model as covariate.

The number of loci identified, the total number of single nucleotide polymorphisms (SNPs) identified per loci, and the overall variation controlled based on mixed linear modeling are also shown.

for four traits; TN, TGW, GSW, and ears/m². Manhattan plots for FT and PH are presented in **Figure 3**. Manhattan plots for all other traits are in **Supplementary Figure S4**. For FT analysis across sites, significant marker-trait associations were detected on six chromosomes corresponding to eight loci and comprising of 47 SNPs (**Table 1**). Based on multiple regression analysis they together explained 45.1% of the variation in FT. Additional site-specific QTL were detected on chromosomes 2B in FRA (2010), 5A in DEU (2010) and 6D in FRA and DEU (2010). Two loci on chromosome 2D (**Figure 3**) (at physical positions 31468893 and 42097013) had the most significant association with FT at all sites controlling an average of 9.6 and 9.3% of FT variation, respectively (**Supplementary Table S3**). These two loci were presumed to be tightly linked with the

*Ppd-D1* gene controlling photoperiod sensitivity. This was verified with the use of the *Ppd-D1* gene marker reported in Bentley et al. (2013) as a covariate which resulted in the loss of significant effect at the two loci. An environmentally stable QTL was also identified on 7A which encompassed up to 31 SNPs that controlled approximately 4% to 8% of phenotypic variation in FT (**Supplementary Table S3**). Two significant loci detected on chromosome 1B were more environment specific, only detected in the GBR and FRA in 2011 and 2010, respectively, and in the across site analysis (**Table 1**). The allelic effects for the three most significant SNPs on chromosomes 1B, 2D, and 7A are shown in the box plot summary in **Figure 4**. The 2D and 1B SNPs had alleles conferring the earliest flowering effect (147–148 days after planting) while the 7A SNP had a more intermediate effect (153– 156 days after planting).

GY QTL were detected on chromosomes 6A and 7B for the across site analysis when the most significant PH and FT SNPs (on 2D and 6A, respectively) were included as covariates in the mixed model. The same loci on 6A and 7B were also detected in the FRA (2010) and GBR (2011) experiments. Both SNPs produced equivalent effects on grain yield (**Figure 4**). Additional QTL were detected on chromosomes 2A and 7A from the DEU and GBR (2011) experiments, respectively (summarized in **Supplementary Table S3**). In total, 13 significant SNPs were found in association with GY, explaining 33% of variation (**Table 1**). Significant associations were detected for PH on eight chromosomes across all experiments comprising 12 loci and 123 SNPs. Together the SNPs explained approximately 53% of the total PH variation. The most significant QTL for PH was detected on chromosome 6A (**Figure 3**) (from physical position 373461190 to 452372111) with 76 SNPs (**Supplementary Table S3**) which controlled approximately 23% of the variation in PH. Two additional loci were also detected on chromosome 6A in association with PH (**Supplementary Table S3**). Covariate analysis with two of the most significant SNPs on 6A as fixed effects in the mixed model did not reveal any additional associations with PH. Significant QTL were also detected for PH on chromosomes 4A and 4B with FDR significance value ≤

0.005 and 0.001, respectively. The previously detected FT QTL on 2D (31468893 and 42097013) were also significant for PH (FDR p-value ≤ 0.004 and 0.001, respectively).

Our analysis detected 147 SNPs in significant association with the presence/absence of awns across 13 chromosomes (**Supplementary Table S3**) which together controlled approximately 77% of variation (**Table 1**). The most significant QTL for presence/absence of awns was detected on chromosome 5A, comprising 71 SNPs which altogether controlled approximately 69% of the phenotype variation. The SNP with the highest significance at position 255590080 on 5A controlled approximately 29% of variation (**Supplementary Table S3**). Two QTL were detected in significant association with Wkill on chromosome 4B and 5A covering 16 SNPs which in total controlled 33 % of variation (**Table 1**). MAT was linked in association with FT on chromosome 2D (position 31468893 and 42097013) and together they explained approximately 25% of the variation. Significant associations were detected on eight chromosomes for Gpt comprising 10 loci of 159 SNPs which together explained 52% of variation controlled (**Table 1**). LR was significantly associated with 14 loci across 10 chromosomes, covering 128 SNPs which altogether explained over 50% of the variation present. The most significant QTL for LR, Gpt, PH, and GY colocalized within the same region of chromosome 6A (physical position 373461190 to 450106742) (**Supplementary Table S3**).

### Comparison of GBS and Dart Marker Mapping

The full panel of 376 lines had previously been genotyped with genome-wide dominant DArT markers and candidate adaptation gene markers with significant marker-trait associations detected for FT, GY, PH, Wkill, Gpt, and TGW (Bentley et al., 2014). These were compared to the GBS mapping results for QTL that had been detected on common and unique chromosomes (**Table 2**). **Figure 3** summarizes QTL detected using DArT for FT and PH. Pearson's correlations of DArT and GBS markers significant for FT, GY, PH, and Gpt revealed highly significant correlations (P-value ≤ 0.001) between markers identified on common chromosomes (1B, 2D, 5A, and 6A) for the same traits (**Figure 5**; **Supplementary Table S6**). This is likely to be an indication that the significant loci were linked between the different marker platforms. GBS mapping detected more significant marker-trait associations (on more chromosomes) than DArT markers for FT, PH, and Gpt, but no significant association was detected for TGW and fewer QTL were identified for GY using GBS. Wkill QTL were identified on different chromosomes in the GBS compared to DArT mapping. In the present study no markertrait associations were detected on chromosome 4D. GWAS analysis accounting for the *Ppd-D1* gene marker as a covariate resulted in loss of the two loci detected on chromosome 2D with GBS markers.

TABLE 2 | Comparison of genotyping-by-sequencing (GBS) and Diversity Array Technology (DArT) mapping analysis based on the chromosomes on which significant associations were detected.


## Genomic Prediction

The highest prediction accuracies were observed for GY and PH, and the lowest for TN using cross-validation across the full panel (**Figure 6**). Prediction accuracies were highest for most of the traits when cross-validation was run across the full germplasm panel (rather than by country subsets). This trend was observed for most of the traits except Wkill which was predicted with highest accuracy in the DEU subset (**Supplementary Table S4**). The lowest accuracy values were recorded for the smallest population size in the GBR subset (**Supplementary Table S4**). In contrast, the highest accuracies were observed in FRA where the training population size was largest (**Supplementary Table S4**). Across country predictions, achieved by training the model on the subset of varieties from one country and predicting the values for the varieties from remaining two countries both singly and jointly also revealed the influence of training population size and degree of phenotypic variation. Accuracy was highest when FRA was used as the training population to predict GY in DEU, GBR and the combined DEU and GBR dataset and lowest when the GBR set was used as the training population to predict performance of the FRA and DEU sets (**Supplementary Table S5**). FT and Awns could only be predicted within the FRA varieties; while ears/m² was only predicted when the FRA varieties were used to train the prediction model.

FIGURE 5 | Diagrammatic representation of correlations between significant markers from Diversity Array Technology (DArT) and genotyping-by-sequencing (GBS) marker platforms. Significant DArT and GBS markers are shown on the vertical and horizontal axis respectively. The DArT and GBS markers used in the correlation shown here are significant for FT, PH, grain protein content (Gpt), and GY on chromosomes 1B, 2D, 4B, 5A, 6A, and 7B. The full names of markers used in correlation are shown in Supplementary Table S6. The size and shade of the squares corresponds to the magnitude of the correlation coefficient as shown in the scale. The p-values of correlations are as follows: p ≤ .05\*, p ≤ .01\*\*, p ≤ .001\*\*\*.

Using GBS markers, prediction accuracies by tenfold crossvalidation on the whole panel were higher than predictions in FRA, DEU, and GBR subsets for FT, PH, MAT, Gpt, LR, and GSW. A similar trend was observed using dart markers and crossvalidation on the whole germplasm panel for predicting FT, PH, Wkill, MAT, and GSW (**Figure 6**). Overall predictions made using GBS by tenfold cross-validation for the full dataset resulted in higher genomic prediction accuracy for GY (0.71) compared to DArT markers (0.67). However, variation was observed for GY predictions by country and GBS gave higher predictions (Compared to DArT) in DEU, equivalent predictions in FRA and lower predictions in GBR. Predictions for FT on the whole panel were equivalent for both GBS and DArT markers. Predictions for Wkill, MAT, Ears, LR, and TGW revealed higher accuracy with DArT than with GBS markers both in cross-validation and training model experiments (**Figure 6**).

### DISCUSSION

GBS is a genotyping tool combining simultaneous *de novo* sequencing and polymorphism discovery. It is useful for diverse variety panels, such as used in this study, to generate markers with broad potential relevance to breeding programs. In this study, a total of 42,795 SNPs were used to generate a high-density physical map and used to study the pattern of LD decay within the wheat genome. Our results showed that on average, LD decayed at the slowest rate on the D-genome and fastest on the A-genome while the B-genome had the largest proportion of polymporphic loci. A previous study of LD among several winter and spring wheat breeding populations revealed a similar pattern of decay among genomes for all the populations (Chao et al., 2010). This trend has been attributed to the latest polyploidization event between tetraploid (AABB) and diploid (DD) progenitors which gave rise to domesticated hexaploid bread wheat (Akhunov et al., 2010). Per chromosome, LD decayed fastest on 1B with the slowest rates recorded for chromosomes 1D, 2B, 2D, and 3D. This could be the result of indirect selection for blocks within these chromosomes containing genes conferring agronomic advantage within our collection of elite European varieties although there is not yet gene-level information to support this. The D-genome also had the lowest number of GBS SNPs with only 240 mapped to 4D and no QTL identified on this chromosome. Similarly, no QTL were identified on chromosomes 1D, 3D, and 5D thought to be generally indicative of the low levels of diversity in the D-genome (Akhunov et al., 2010).

Population structure analysis revealed that there was no clear structural partitioning within our association mapping panel. As the panel was assembled from elite lines originating from three different European countries, it was expected that the panel would be structured by country. Although the varieties in the panel did tend to approximately group by country of origin, there was no clear separation of clusters into country of origin. This is an indication of the extent to which European wheat breeding materials are related and exchanged among breeders. Similar trends were also observed in other studies on European winter wheat mapping panels (Langer et al., 2014; Albrecht et al., 2015). PCs derived from GBS markers explained a larger proportion of the variation that DArT markers, likely an effect of the larger number of markers available.

High density genotyping appreciably increased the precision of association mapping in the panel. This was established by the identification of similar loci to the previous study on the panel (Bentley et al., 2014) in addition to detection of loci not previously found. High density genetic linkage maps are one of the key factors for high precision QTL detection in association mapping studies (Elshire et al., 2011; Poland et al., 2012b). Both DArT and GBS use the methylation sensitive restriction enzyme *Pst1* for cutting the genome (with GBS also using methylation sensitive *Msp1* as the second cutting enzyme while DArT uses nonmethylation sensitive *MseI*). In combination with the diverse panel of germplasm used in this study, we expected that as a result SNPs in strong LD with known genes and causal loci should be detected to a high precision, and to a higher degree with GBS compared to DArT. Although some DArT markers are anchored to RefSeq v1.0 and are available *via* the Wheat@URGI portal (Alaux et al., 2018) it is not currently possible to anchor the DArT markers used in Bentley et al., 2014 to the physical map to facilitate a complete comparison of QTL detection. While previous studies have shown that it is possible to find microsatellite repeats within DArT microarray clone sequences and then design PCR-based markers and assign to map locations, this is a low-throughput process (Fiust et al, 2015). However, we are able to report on the scale and correlation of detected marker trait associations and predictive ability between the anchored GBS data and previous DArT data.

Marker trait associations for FT were identified at seven loci across five chromosomes. Three of the associations detected for FT mapped to chromosomes with known genes regulating FT. The two loci detected on chromosome 2D were established to be linked to *Ppd-D1* (Beales et al., 2007) when the gene marker was accounted for as a covariate in the mixed model. Chromosomes 1B and 7A have both been associated with *Earliness per se* in separate studies by Griffiths et al. (2009) and Hanocq et al. (2004). Chromosome 7A carries the vernalization gene, *Vrn-3A,* which accelerates flowering in wheat. The two QTL on 2D and the locus on 7A (spanning 4 to 31 SNPs) were also stable across all trial environments. They reveal good potential for FT genetic marker screening in breeding materials and variability in allelic effects for these SNPs can be potentially useful in marker assisted breeding where these loci are not already fixed.

PH QTL were detected across 12 loci on eight chromosomes. The most significant QTL on 6A (373461190:452372111) was also identified to be highly significant in a QTL mapping study in a RIL population by Marza et al. (2006). In the previous GWAS study with DArT markers (Bentley et al., 2014), the most significant PH QTL was the *Rht-D1* gene marker on chromosome 4D. No QTL were mapped with GBS to chromosome 4D in this study. This is in contrast to our expectation that GBS should detect known genes to high precision and is likely to be due to a lack of SNPs in sufficient LD with this gene; only 240 GBS SNPs were identified on 4D (**Supplementary Table S2**). However, two loci were identified to be associated with PH on chromosomes 4A (290527503:291878645) and 4B (21378087:21379808) which may be linked to homologues of *Rht-D1* on 4D (18780696:18781314) (Worland et al., 1998; Wilhelm et al., 2013) although there is not a correspondence in physical position. Although a homoeologous locus (*Rht-A1*) exists on 4A, and has been shown to express the DELLA protein, linked markers, or phenotypic effects on plant height have yet to be determined (Pearce et al., 2011).

The presence or absence of awns is a simple trait controlled by a known locus with a large effect on 5A (Kato et al., 1998; Mackay et al., 2014). In this study GBS detected this major effect locus in the same location as the previously validated marker tagging the 5AL genetic locus (previously reported based on validation in this panel). Additional minor QTL linked with the presence/absence of awns were detected on twelve other chromosomes. Although it is a binary trait (presence/absence), these additional QTL could be useful to understanding the genetic network controlling the presence of awns. Further understanding of the genetic architecture of this trait is relevant to breeding as awns have been shown to contribute to photosynthesis and increase in grain size and yield in drought stressed environments (Rebetzke et al., 2016). Both environmentally stable and site-specific QTL were identified for GY. The QTL on 6A was detected in all three European trials. Other QTL and association mapping studies have also reported loci on 6A associated with yield under varying environmental conditions (Cui et al., 2014; Edae et al., 2014; Sukumaran et al., 2014). Chromosomes 2D and 6A featured associations with several key traits. Six traits: FT, GY, PH, Awns, Gpt, and LR were identified in association with loci on 6A while FT, PH, MAT, and LR were associated with loci on 2D. Similar to this trend, Marza et al. (2006) also observed the colocalization of QTL for several traits on chromosome 6A. Most of the traits with colocalized QTL in the present study were also observed to have high positive or negative correlations with each other (**Figure 7**). A similar pattern was observed in an association study of QTL controlling agronomic traits in an elite rice breeding panel (Begum et al., 2015). On chromosome 2D, the same SNPs were found in significant association with FT, PH, and MAT while a nearby SNP was found in association with LR. SNPs on 6A, significantly associated with GY, PH, Gpt, and LR were located within the same region of the chromosome and were in LD. This observation supports the likelihood of pleiotropy on 2D and an underlying gene linkage on 6A. Photoperiod insensitivity and reduced height genes have a positive impact on GY and LR (Hedden, 2003; Wilhelm et al., 2013). Reduced height and photoperiod genes have been shown to enhance LR while simultaneously conferring adaptive advantages to favor GY in different agro-climatic conditions (Worland et al., 1998; Donmez et al., 2001). The overlap between GY and Gpt loci on 6A could potentially be exploited to simultaneously improve yield and quality traits of wheat. Phenotype correlations between GY and Gt were however observed to be highly negative (r² = –0.75) (**Figure 7**). A similar negative association was found in a QTL mapping study of GY and grain quality by Tsilo et al. (2010) and has also been reviewed in detail by Balyan et al. (2013). Breeding efforts to increase Gpt resulted in lower genetic gains in yield compared to high yielding cultivar checks. Several QTL have been detected in this study for Gpt which are possibly independent of GY that could be further studied for exploitation in breeding.

Comparison of predictive abilities between the two marker platforms for all traits revealed only slight differences in accuracies with predictive ability mostly depending on the

genetic architecture of the trait. Using GBS, GY was predicted with higher accuracy than FT despite fewer loci detected in significant marker trait association with GY. This could indicate the effectiveness of GBS in GS for capturing many small effect loci underlying GY which did not reach the significance threshold for association mapping. On the other hand, FT was predicted with greater accuracy using DArT markers in most of the scenarios tested with the highest accuracy estimated within the FRA variety set. Prediction of FT in the DEU and GBR germplasm was ineffective. As discussed by Bentley et al. (2014), this was due to the dominating effect of *Ppd-D1a* photoperiod insensitive mutation within the FRA germplasm which conferred earlier flowering effects for the FRA varieties compared to those from DEU and GBR which are almost exclusively photoperiod sensitive *Ppd-D1b* types. Due the absence of variation for awns within the DEU and GBR germplasm, awn presence or absence could not be predicted within the country subsets by tenfold cross-validation.

In a similar study by Poland et al. (2012a), GBS consistently produced higher prediction accuracies than DArT markers for 1,000 kernel weight and heading date, even with a comparable number of GBS and DArT markers. Jiang et al. (2015) conducted GS for prediction of resistance to Fusarium Head Blight using three different marker platforms (single sequence repeats, a 9K SNP array, and a 90K SNP array) and observed similar prediction accuracies with the three platforms for three prediction models. They concluded that relatedness was a key driver of prediction accuracy and we propose that the ability of the higher density of GBS markers to account for kinship is the main driver for increased prediction accuracies in this study. Validation of GS in diverse germplasm is important for the integration of this method in routine breeding programs. As shown in this study, GS across country germplasm is feasible for most of the traits measured, however, the composition of the training populations needs to be optimized for adequate genetic variation.

## CONCLUSION

The use of GBS has potential for practical application in wheat breeding and is a cost-effective platform for generating thousands of polymorphic SNPs with genome-wide coverage. Using the IWGSC Ref Seq v1.0 (International Wheat Genome Sequencing Consortium, 2018) for alignment of sequence reads and variant SNP calling enabled the generation of over 40,000 high-quality SNP data points. When applied to association mapping and genomic prediction in European winter wheat, GBS data anchored to IWGSC RefSeq v1.0 generally improved accuracy. In particular, this study demonstrates the utility of GBS for effectively predicting traits with many loci of small effects proving its suitability for GS. For mapping, the high marker density provided by GBS enhanced the precision of QTL mapping by increasing the probability of finding and tagging causal polymorphisms, although this was still limited on the D-genome. Prediction accuracies were higher when calculated across the panel; however, accuracy was highly dependent on the trait genetic architecture. This feature was common across both GBS and DArT marker platforms.

### AUTHOR CONTRIBUTIONS

AB designed and oversaw the experiments. OL and IM conducted the analysis. JP generated GBS data and oversaw sequence anchoring and variant calling. SP coordinated the collection of new field phenotyping data. JH contributed to analysis and interpretation of data. OL wrote the paper. All authors contributed reviewing the manuscript.

### FUNDING

We acknowledge the support for Olufunmilayo Ladejobi's PhD through the Biotechnology and Biological Sciences Research Council (BBSRC) and Department for International Development (DfID) Sustainable Crop Production Research for International Development (SCPRID) project "Wild Rice MAGIC" led by JH (BB/J011754/1). AB is supported by the BBSRC Cross-Institute Strategic Programme "Designing Future Wheat" BB/P016855/1. AB and JH are supported by the GCRF GROW project TIGR2ESS (BB/P027970/1).

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2019.01278/ full#supplementary-material

### REFERENCES


associations in the maize genome. *Proc. Natl. Acad. Sci.* 98, 11479–11484. doi: 10.1073/pnas.201394398


**Conflict of Interest:** The panel and phenotypes described were generated as part of the European Commission grant under the 7 Framework Programme for Research and Technological Development (FP7-212019). The funders had no role in the design or analysis of the experiments presented. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. Author Ian J Mackay was employed by company IMPlant Consultancy Ltd. Author Sebastien Praud was employed by company Biogemma. All other authors declare no competing interests.

*Copyright © 2019 Ladejobi, Mackay, Poland, Praud, Hibberd and Bentley. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Long Non-coding RNA in Plants in the Era of Reference Sequences

Hikmet Budak<sup>1</sup> \*, Sezgi Biyiklioglu Kaya<sup>2</sup> and Halise Busra Cagirici<sup>2</sup>

<sup>1</sup> Montana BioAgriculture, Inc., Bozeman, MT, United States, <sup>2</sup> Engineering and Natural Sciences, Molecular Biology, Genetics and Bioengineering Program, Sabanc*ı* University, Istanbul, Turkey

The discovery of non-coding RNAs (ncRNAs), and the subsequent elucidation of their functional roles, was largely delayed due to the misidentification of non-proteincoding parts of DNA as "junk DNA," which forced ncRNAs into the shadows of their protein-coding counterparts. However, over the past decade, insight into the important regulatory roles of ncRNAs has led to rapid progress in their identification and characterization. Of the different types of ncRNAs, long non-coding RNAs (lncRNAs), has attracted considerable attention due to their mRNA-like structures and gene regulatory functions in plant stress responses. While RNA sequencing has been commonly used for mining lncRNAs, a lack of widespread conservation at the sequence level in addition to relatively low and highly tissue-specific expression patterns challenges high-throughput in silico identification approaches. The complex folding characteristics of lncRNA molecules also complicate target predictions, as the knowledge about the interaction interfaces between lncRNAs and potential targets is insufficient. Progress in characterizing lncRNAs and their targets from different species may hold the key to efficient identification of this class of ncRNAs from transcriptomic and potentially genomic resources. In wheat and barley, two of the most important crops, the knowledge about lncRNAs is very limited. However, recently published high-quality genomes of these crops are considered as promising resources for the identification of not only lncRNAs, but any class of molecules. Considering the increasing demand for food, these resources should be used efficiently to discover molecular mechanisms lying behind development and a/biotic stress responses. As our understanding of lncRNAs expands, interactions among ncRNA classes, as well as interactions with the coding sequences, will likely define novel functional networks that may be modulated for crop improvement.

Keywords: wheat, barley, whole genome sequencing, computational identification, long non-coding RNA

### INTRODUCTION

Since the realization of regulatory information contained within the non-proteincoding parts of DNA, efforts to identify non-coding RNA molecules have greatly accelerated. Advances in RNA sequencing technology have contributed to this acceleration and the discovery of non-coding RNAs, including lncRNAs, which elucidated their structures and functions. As our understanding of the regulatory roles of lncRNAs has improved, the importance of these non-coding molecules has

#### Edited by:

Jacqueline Batley, The University of Western Australia, Australia

#### Reviewed by:

Matthew R. Willmann, Cornell University, United States Xiaojun Nie, Northwest A&F University, China

> \*Correspondence: Hikmet Budak hikmet.budak@icloud.com

#### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 06 December 2018 Accepted: 21 February 2020 Published: 12 March 2020

### Citation:

Budak H, Kaya SB and Cagirici HB (2020) Long Non-coding RNA in Plants in the Era of Reference Sequences. Front. Plant Sci. 11:276. doi: 10.3389/fpls.2020.00276

**238**

become more apparent. However, there is still much to discover about the functions of lncRNAs in cellular pathways.

A step further to understand both coding and non-coding elements was taken recently for wheat and barley: highquality reference sequences have been published (Mascher et al., 2017; IWGSC, 2018). Wheat and barley are two of the most consumed and cultivated crops; thus, increasing the yield have been the ultimate goal for breeders and scientists to overcome the effects of population growth and climate change. Having a reference genome in hand improved the accuracy of the analyzes to find the origins of favorable traits and regulatory mechanisms that control the expression of the genes responsible for those traits. Therefore, wheat and barley reference sequences have opened a new era in the field of multiomics research, allowing more accuracy and robustness toward the lightening of the undiscovered mechanisms within these important crops.

### BIOGENESIS OF lncRNAs

Long non-coding RNAs (lncRNAs) are defined as transcripts longer than 200 bp that cannot construct a full-length protein (Kapranov et al., 2007). The lack of discernable coding potential is what mainly differentiates lncRNAs from mRNAs.

Similar to mRNAs, most lncRNAs are transcribed by RNA polymerase II and are subject to 5<sup>0</sup> -end capping, alternative splicing, and the addition of 3<sup>0</sup> poly-A tails (Chekanova, 2015). Plant lncRNAs can be transcribed by two additional polymerases; RNA Pol IV or RNA Pol V (Wierzbicki et al., 2008). Unlike Pol II transcripts, these lncRNAs are less characterized and possess some structural differences such as lack of poly-A tails (Zhou and Law, 2015). Identification of RNA Pol IV or PolV transcribed lncRNAs is particularly challenging due to their extremely low expression and instability (Rai et al., 2018). However, these transcripts are the major players driving RNA-mediated DNA methylation (RdDM). Plants have evolved a highly sophisticated RNA interference-dependent RdDM mechanism to ensure genomic stability (Matzke and Mosher, 2014). Briefly, in this pathway, an lncRNA transcribed by RNA polymerase IV is later processed into 24-nt small interfering RNAs (siRNAs) (You et al., 2013). lncRNA transcribed by RNA polymerase V is recognized by the siRNA-AGO complex and drives this complex to the chromatin target site together with chromatin modifying enzymes. Following interaction with the AGO complex, additional proteins and methyltransferases are recruited to cytosine residues at the target region to initiate gene silencing (Wierzbicki et al., 2008).

RNA polymerase IV transcripts reportedly act mostly as siRNA precursors, whereas RNA polymerase V and some RNA polymerase II transcripts are sRNA targets. RNA polymerase IV and V transcripts have mostly been studied in Arabidopsis thaliana, where a recent study identified 10s of 1000s of RNA polymerase IV-dependent lncRNAs using an RNA polymerase IV mutant (Li et al., 2015).

### INFLUENCE OF RNA SEQUENCING TECHNOLOGIES ON THE DISCOVERY OF lncRNAs

A general method for identifying and functionally characterizing transcripts is shown in **Figure 1**. Improvements in RNA sequencing technology paved the way for expanding our understanding of RNA. Previous attempts to uncover transcriptomes relied mostly on microarray technology, which is inefficient and limited in coverage of the whole transcriptome, whereas next-generation DNA and RNA sequencing applications are readily available on many platforms, offering better and more consistent quality (Denoeud et al., 2008; Ozsolak and Milos, 2011). Together with the development of computational tools, the most striking and unexpected evidence has been collected from the non-coding parts of the genome, revealing the transcription of numerous non-coding RNA molecules in various structures and roles. Of all the RNA species discovered to date, lncRNAs are the most unclear class of molecules and might still hide many unknown features. To reveal the secrets of lncRNAs and other non-coding RNA species, new RNA sequencing applications have been developed. For example, while conventional RNA sequencing allowed sequencing of up to 600 nucleotides at a time, the deep sequencing approach has enabled sequencing of longer reads at high accuracy (Malone and Oliver, 2011; Chu et al., 2015). RNA capture sequencing detects targeted RNA molecules with low abundance in the transcriptome (Mercer et al., 2011; Clark et al., 2015), and was designed to overcome obstacles in conventional RNA sequencing in detecting low-abundance lncRNAs.

To study the functions of lncRNAs, several immunoprecipitation-based methods have been developed that reveal the interacting RNA partners of specific proteins, together with high-throughput sequencing. ChIRP-seq is one of these methods, and involves precipitation of in vivo cross-linked RNA–DNA and RNA–protein hybrids by a biotin-streptavidin interaction and then sequencing of the RNA and DNA molecules that appear in the precipitated hybrids (Chu et al., 2011, 2015).

FIGURE 1 | A general workflow for the identification and characterization of transcripts.

CLIP-seq is another immunoprecipitation-based technique that has been used to explore miRNA–lncRNA interactions (Murigneux et al., 2013; Li J. et al., 2014). As demonstrated by these examples, RNA sequencing technology can be improved and modified according to the needs of the study. Improvements in the efficiency and accuracy of RNA and DNA sequencing techniques, not only for the identification of lncRNAs, but also for other RNA species, will lead to a more complete understanding of the secrets of the cellular mechanisms and their regulators.

In silico predictions based on sequencing have revealed many lncRNAs with expression patterns that remain to be confirmed. qRT-PCR allows the detection and quantification of expression in real time and is therefore widely used to verify the expression of in silico-predicted lncRNAs (Shuai et al., 2014). lncRNAs have been functionally annotated based on co-expression patterns, interaction networks, or both. Functions of lncRNAs can be predicted based on co-expressed protein-coding genes and/or genomic co-localization of genes (Guttman et al., 2009; Liao et al., 2011). For example, the lncRNAs COOLAIR and COLDAIR are expressed at the FLC locus and control FLC expression (Heo and Sung, 2011). Moreover, lncRNAs can serve as sRNA targets, preventing interaction between the sRNA and its protein-coding target, thereby enhancing the function of a particular proteincoding gene (Britton et al., 2014; Shuai et al., 2014).

These interaction networks between lncRNA, miRNA, and mRNAs suggest that some lncRNAs function as endogenous target mimics (Franco-Zorrilla et al., 2007; Chen et al., 2013). lncRNAs can also serve as sRNA precursors, with the downstream patterns of the corresponding sRNA revealing the involvement of lncRNAs in various molecular pathways (Matzke and Mosher, 2014; Ariel et al., 2015). Potential functions of lncRNAs can be confirmed by construction of trangenic lines with either downregulation or overexpression of genes. T-DNA insertions can be used for either gain-of-function or loss-of-function mutagenesis (Radhamony et al., 2005) whereas RNAi interference results in loss-of-function. For example, Zhu et al. (2014) identified lncRNAs in Arabidopsis thaliana that were differentially expressed during infection with Fusarium oxysporum and confirmed antifungal activity of 10 lincRNAs using T-DNA insertion and RNAi lines. Identification and confirmation of the interactions and functions of these noncoding RNAs is critical for the characterization of important molecular pathways.

### lncRNA ANNOTATION FROM RNA SEQUENCING DATA

When using RNA sequencing data to annotate lncRNAs, computational procedures commonly begin with the alignment of sequencing reads on the reference genome, if available, and the assembly of transcript models from the mapped reads using computational tools that can be chosen from a wide range of software and algorithms based on their features and computational requirements (Ilott and Ponting, 2013). When a reference genome is lacking for the species of interest, the assembly can be accomplished de novo although this strategy is more error-prone by being more sensitive to sequencing errors and chimeric molecules, and requiring more coverage in sequencing (Martin and Wang, 2011). After this point, the assembled transcripts should be evaluated to distinguish lncRNAs from a variety of non-coding RNAs and protein-coding mRNAs. Although complex and unclear features of lncRNAs have led researchers to adopt different methods and tools for the identification process, they seem to agree on a few basic criteria to select lncRNAs from other RNAs, such as minimum length. Many studies assume a 200-nucleotide length threshold to separate lncRNAs from snRNAs. Even though the presence of lncRNAs below this threshold has not been fully disproven, it is useful to eliminate snRNAs from the data (Ma et al., 2013). However, this criterion is mostly arbitrary and, alone, cannot define lncRNAs. In addition, this criterion does not distinguish between lncRNAs and mRNAs, since both types of RNA are commonly longer than 200 nucleotides (Milligan and Lipovich, 2015).

Therefore, for sequences that pass the first criterion, researchers usually assess open reading frame (ORF) content and length. Since transcripts containing long ORFs are assumed to be translated into full-length proteins, lncRNAs are expected to lack an ORF, or at least a long ORF (Boerner and McGinnis, 2012). Previous studies have speculated that most lncRNAs contain a short ORF (Banfai et al., 2012; Lv et al., 2013; Ruiz-Orera et al., 2014) and can occupy ribosomes, with contradictory conclusions about whether they encode protein products (Guttman et al., 2013; Ruiz-Orera et al., 2014; Popa et al., 2016). Despite lacking a clear explanation of the translational features of lncRNAs, these conflicting findings agree on another arbitrarily determined criterion, that is, an ORF size threshold of encoding 100 amino acids (Ilott and Ponting, 2013; Musacchia et al., 2015). After eliminating transcripts containing ORFs above the threshold, transcripts that satisfy the ORF size criterion are often examined to determine whether the remaining ORFs potentially encode any functional proteins. Several methods are used to calculate coding potentials and various algorithms can be used to assess candidate transcripts in terms of ORF presence, quality, intactness, and similarities to sequences encoding known proteins (Boerner and McGinnis, 2012; Mattick and Rinn, 2015). As this step is highly dependent on the quality of RNA sequencing reads and alignments on reference genomes, low-quality sequencing or alignment data, or lack of a reference genome, increases the chances of misleading coding potential calculations.

The use of machine learning techniques alone has increased the accuracy of coding potential calculations to over 90% (Kong et al., 2007; Hoff and Stanke, 2013; Sun et al., 2013). Nonetheless, due to slight differences in the approaches of conventional coding potential calculation tools, combining several of these tools may increase the stringency of the identification pipeline (Pauli et al., 2012).

The final criterion applied in many lncRNA identification pipelines involves exclusion of candidate transcripts that exhibit homology to known coding sequences, proteins, or protein domains. Similar to coding potential, homology can be assessed by several methods that use different databases for transcript comparisons (Jia et al., 2010; Pauli et al., 2012). However, a caveat

of this criterion is the loss of these exonic lncRNAs, leaving only lncRNAs expressed from intronic or intergenic spaces that do not overlap with the exons of any protein-coding genes (Housman and Ulitsky, 2016). Therefore, a balance between sensitivity and robustness must be properly maintained while designing the pipeline with elimination thresholds tailored to the aim of the study.

### STRUCTURAL AND FUNCTIONAL CHARACTERIZATION OF lncRNAs

lncRNAs can be classified with respect to their genomic location and the direction of transcription (**Figure 2**), including intergenic, intronic, or exonic regions in the sense and antisense directions (Mattick and Rinn, 2015). The most controversial class was exonic lncRNAs that transcribed in the sense orientation. The lncRNA transcripts intersecting with the exons of protein coding genes had been eliminated until the latest release of GENCODE v7 catalog of human long non-coding RNAs (Derrien et al., 2012). However, some non-coding transcripts may arise from alternative splicing or truncation of first or last exons of protein coding genes. For example, SRA1 gene encodes for a lncRNA transcript [steroid receptor RNA activator (SRA)] as well as a protein coding transcript (SRAP) by alternative splicing (Sheng et al., 2018). Functional characterization of SRA have been performed well in both human and mouse (Nam et al., 2016). In fact, functions of SRAP has been less studied when compared to SRA. Although, there are currently not exonic lncRNAs with known functions available in plants yet, exonic lncRNAs have been reported in several plant species but without functional characterization (Liu et al., 2013; Quattro et al., 2017). Broadly, plant lncRNAs with known functions are classified as long intergenic non-coding RNAs (lincRNAs), intronic non-coding RNAs (incRNAs), and natural antisense transcripts (NATs) (**Table 1**).

lncRNAs transcribed outside of protein-coding genes are loosely classified as lincRNAs. Most research on plant lncRNAs has focused on lincRNAs, leading to the identification of several lncRNAs with well-studied functions, such as LDMAR (Ding et al., 2012), APOLO (Ariel et al., 2014), IPS1 (Franco-Zorrilla et al., 2007), and Enod40 (Campalans, 2004). lncRNAs transcribed from intronic regions in the sense direction are called incRNAs. COLDAIR, transcribed from the first intron of Flowering Locus C (FLC), is the best-known plant incRNA (Heo and Sung, 2011). lncRNAs transcribed from the antisense direction to a protein-coding gene are classified as NATs. Wellstudied examples of plant NATs include COOLAIR (Csorba et al., 2014) and HID1 (Wang et al., 2014). Recently, an antisense transcript of HvCesA6, which acts as a precursor to small RNA (sRNA) targeting the CesA6 gene, was shown to be involved in regulating cell wall synthesis in barley (Hordeum vulgare) (Held et al., 2008). Several plant NATs with newly characterized functions include cis-NAT PHO1;2 (Jabnoune et al., 2013), TL (Liu et al., 2018), and LAIR (Wang et al., 2018). The functions of the best-studied plant lncRNAs are listed in **Table 1**.

lncRNAs can also be classified based on their function, such as a decoy, scaffold, guide, signal, or signal enhancer. Decoys, such as IPS1, delay protein function by mimicking specific regions of the protein's target (Franco-Zorrilla et al., 2007). Scaffolds help to bring multiple proteins and RNAs together to form functional machineries, and recruit RNAs or proteins to a target region, as in RdDM (Matzke and Mosher, 2014). Signals, such as COLDAIR, are expressed under specific conditions to mediate biological processes (Heo and Sung, 2011). However, a single function model does not always apply to lncRNA function. An lncRNA might exhibit several functions which are usually linked. For example, in RdDM, an lncRNA transcribed by RNA polymerase V can act as a guide for the siRNA-AGO complex to the chromatin target site and as a scaffold for chromatin modifying enzymes and proteins.

An alternative model for classifying lncRNA functions is based on their structural features and the types of interactions they have with their targets, such as DNA interactions or protein interactions (Kung et al., 2013). As in the example of highly complex RdDM pathway, lncRNAs can be expected to have certain secondary structures to bring different



\*N/K: not known.

chromosome regions or proteins in close proximity. At the end, mechanisms of action include formation of chromosome looping between enhancer and promoter regions, modulation of gene activation and regulation, recruitment of chromatin modifying factors, enhancement of DNA methylation, and chromosome inactivation (Liu et al., 2012).

In some other cases, expression of an lncRNA, rather than the lncRNA itself, is important to initiate a biological process. For example, in mice, rather than the action of lncRNA Airn, its transcription induces Igf2r gene silencing (Latos et al., 2012). Airn is an antisense lncRNA to Igf2r gene whose promoter lies between Airn transcript in the opposite orientation (Santoro et al., 2013). Airn transcribing RNA polymerase prevents assembly of transcription initiation complex at the Igf2r promoter, thus prevents its expression. In another study, mutant lines, of Arabidopsis, with an enhanced promoter inside the T-DNA region resulted in a strong expression of a long transcript extending over the promoters of neighboring genes in the same orientation. Similarly, initiation of transcription from an intergenic T-DNA insertion halted expression of a downstream gene in Arabidopsis (Hedtke and Grimm, 2009), by making its promoter site inaccessible by transcription initiation complex. In diverse species, polymerase activity extending over the promoter of another gene halted the expression of downstream genes in either opposite or same orientation, indicating that this mechanism is likely to be conserved between species. These studies also emphasize the challenges of functional characterization of lncRNAs.

### DEVELOPMENTAL STAGE-RELATED lncRNAs

Many lncRNAs function in developmental pathways in plants. One of the best-characterized examples of this regulation was discovered in Arabidopsis at the transition from the vegetative to generative stage. FLC is a regulator of flowering time in Arabidopsis that represses the induction of flowering (Csorba et al., 2014). An antisense lncRNA to FLC gene, COOLAIR, was discovered as upregulated at the beginning of vernalization (Shafiq et al., 2015). COOLAIR is involved in FLC repression by both autonomous and vernalization pathways, which leads to flowering in spring. Homology-based search was performed to find FLC locus and antisense FLC transcripts in other monocots, and the results showed that although there is no sequence conservation between antisense FLC transcripts and Arabidopsis COOLAIR lncRNA, the locations of these transcripts were conserved in six grass species including T. aestivum (Jiao et al., 2019). COLDAIR was identified as another lncRNA potentially regulating FLC expression. It is also transcribed in response to cold; however, in contrast to COOLAIR, COLDAIR is oriented in the sense direction of FLC (Heo and Sung, 2011). COLDAIR has been suggested to maintain vernalization by repressing FLC (Yamaguchi and Abe, 2012). Both lncRNAs serve as signals that determine the developmental stage of the plant, but much remains to be discovered on their exact functions, interactions and their presence in grass species such as wheat and barley.

Another lncRNA regulating developmental pathways is longday-specific male-fertility-associated lincRNA (LDMAR). LDMAR expression below a certain level affects pollen development in rice under long-day conditions. Mutations causing reduced expression of LDMAR result in photoperiodsensitive male sterility in plants grown under long-day conditions (Ding et al., 2012; Zhang and Chen, 2013). Again, the mechanism by which LDMAR regulates pollen development and whether it is expressed in cereals is unclear.

Recently, Guo et al. (2018) identified a novel lncRNA, Wheat Seed Germination Associated RNA (WSGAR), that modulates wheat seed germination. The proposed mechanism of action starts with a wheat-specific miRNA (miR9678) targeting WSGAR, which in turn is processed into phasiRNA and interferes with seed germination. Even though being not well-characterized, another study identified 177 lncRNAs that were responsive to a drug that blocked Ca2<sup>+</sup> channels in wheat roots. They also observed that lengths of the roots were significantly decreased and root growth was prevented with increasing amounts of drug. Therefore, these 177 lncRNA identified was suggested to be related to root growth in wheat (Ma et al., 2018).

### STRESS-RESPONSIVE AND OTHER lncRNAs IN WHEAT, BARLEY, AND RELATIVES

lncRNAs have been identified in many species from mammals to plants, including model organisms and economically important crop species, as more transcriptomic and genomic data have become available. One of these classes of crops is the Triticeae tribe, which includes cereal species such as wheat and barley important sources of nutrition in the human diet (Moore et al., 1995). Unraveling cellular mechanisms responsible for gene expression under stress conditions is the objective of ongoing research, in efforts to breed cultivars better able to withstand abiotic and biotic stresses (Pieri et al., 2018). For this purpose, the lncRNA repertoires of two of the three diploid wild ancestors of bread wheat (Triticum aestivum, AABBDD), Triticum urartu (AA) and Aegilops tauschii (DD), whose draft and reference genomes were recently published (Jia et al., 2013; Ling et al., 2013; Luo et al., 2017), were examined. Identified lncRNAs, 13,993 lncRNAs from T. urartu and 20,338 from Ae. tauschii, were also compared to bread wheat and tetraploid wild emmer wheat (Triticum turgidum ssp. dicoccoides, AABB), a wild subspecies of T. turgidum (AABB), the tetraploid ancestor of bread wheat (Pieri et al., 2018). Comparative analyses using RNA sequencing data suggested that the conservation between lncRNA repertoires decreased as the evolutionary distance increased (Pieri et al., 2018). Wild emmer wheat has long been a promising resource for exploration and exploitation of stress responses, due to the remarkable genetic diversity its wild populations retain. Akpinar et al. (2018) predicted lncRNA genes in the T. turgidum ssp. dicoccoides genome and investigated potential lncRNA-miRNAmRNA networks. The results of this study revealed 89,623 lncRNAs where 23,713 were identified as potential miRNA targets (Akpinar et al., 2018). Another study identified lncRNAs in two cultivars of wild emmer wheat, Kiziltan and TR39477, and one durum wheat (T. turgidum ssp. durum, AABB), a domesticated subspecies of T. turgidum, revealing 63,773, 61,823, and 43,932 lncRNAs in Kiziltan, TR39477 and durum wheat, respectively. This study reported that 3% of the identified Kiziltan lncRNAs, 6% of the identified TR39477 lncRNAs, and 4% of the durum wheat lncRNAs were differentially expressed in response to drought and called as 'drought-responsive' lncRNAs, with most only expressed under drought (Cagirici et al., 2017). Moreover, lncRNAs were identified from the transcriptome of durum wheat cultivar Svevo concurrently with the assembly of its genome. 115,437 lncRNAs were identified and chromosome 3B contained the highest number of lncRNA genes (Maccaferri et al., 2019).

As its ancestors, the bread wheat genome and transcriptome were investigated for lncRNA expression patterns under various biotic and abiotic stress conditions. An analysis of lncRNAs in bread wheat genotypes revealed 77 that were responsive to heat stress, 71 to fungal infection, and 23 to both conditions (Xin et al., 2011). A more comprehensive study identified lncRNAs from 52 sets of RNA sequencing data obtained under heat and drought stress, concluding that 29% of the lncRNAs were responsive to these abiotic stress conditions. Furthermore, the same study explored lncRNA expression under salt stress and identified two lncRNA groups showing distinct expression patterns; one was upregulated in the first hours after exposure and downregulated later, and the second group showed the opposite pattern (Shumayla et al., 2017).

Barley is another economically important species consumed worldwide, and has been studied for a better understanding of response mechanisms to stress (Gozukirmizi and Karlik, 2017). One study examined the barley transcriptome for lncRNAs and their expression patterns under excess boron (Karakulah and Unver, 2017). A second study observed differential expression patterns of two specific lncRNAs in cultivars exposed to salinity; one of those lncRNAs, AK372814, was upregulated under salinity stress (Karlik and Gozukirmizi, 2018), providing a clue to gene regulatory elements involved in responses to salinity. These results give a broad perspective of expression patterns and abundance of lncRNAs in genomes, suggesting that lncRNAs function in cellular mechanisms that are regulated under various stress conditions. However, specific lncRNAs that are involved in stress response pathways largely remain to be identified. Even though next-generation sequencing has provided insight into many species' genomes and transcriptomes, it will be a long path to narrow down these findings and identify the cellular pathways responsible for stress resistance and regulatory molecules.

### STRESS-RESPONSIVE lncRNAs IN OTHER CROPS

Maize is another important crop and perhaps one of the plant species that has been most extensively studied for lncRNAs. Maize lncRNAs are mostly single exonic and found in intergenic regions, whereas only a small portion coincide with proteincoding genes on the genome (Li L. et al., 2014). Attempts to find lncRNAs responsive to drought revealed 664 lncRNAs

that were differentially expressed under drought stress, and were also identified as potential precursors for small noncoding RNA (snRNA) species such as miRNAs, siRNAs, and shRNAs (Zhang et al., 2014). In addition to drought, differentially expressed lncRNAs were predicted in maize under nitrogen deficiency, with most being downregulated. These nitrogen deficiency responsive lncRNAs were examined for co-expression with protein-coding transcripts; 32 were co-expressed with 239 protein-coding transcripts in functional annotation categories including NADPH/NADH dehydrogenation, indicating that these lncRNAs are potential regulators of nitrogen assimilation and photosynthesis since elevated NADH/NADPH consumption is associated with nitrogen assimilation and since photosynthesis reactions are the most important NADPH resources (Lv et al., 2016). In rice (Oryza sativa), lncRNAs were investigated under drought and cadmium stress. Under drought stress, 98 lncRNAs were differentially regulated (Chung et al., 2016). Under cadmium stress, 122 of the differentially-expressed transcripts were defined as lncRNAs (He et al., 2015). However, the functions of these lncRNAs are unclear. As in cereals, attempts to discover stress responsive lncRNAs in other crops are still in progress.

### DRAWBACKS IN lncRNA IDENTIFICATION AND TARGET PREDICTION

Current methods used to identify lncRNAs are not sufficiently accurate or comprehensive. In the absence of a standardized set of selection criteria, researchers must design their own pipelines and decide on the thresholds and tools to use, which may cause incorrect and conflicting results to accumulate in the literature and in databases. Despite continuous efforts to identify lncRNAs from many species, methods developed to date are far from complete, especially due to the complex and unclear nature of these molecules.

In contrast to mRNAs, lncRNAs rarely show evolutionary sequence conservation among species (Ponjavic et al., 2007; Diederichs, 2014). Therefore, instead of directly selecting transcripts that show sequence similarities with lncRNAs of closely related species, lncRNA identification pipelines highly depend on the elimination of RNAs that exhibit mRNA-like and snRNA-like features and classification of the remaining transcripts as lncRNAs. However, the precise identification of the whole lncRNA repertoire for an organism seems impossible due to transcripts that are short and protein coding, and transcripts that are non-coding with long ORFs. Therefore, researchers should also be cautious when considering novel protein-coding transcripts; some transcripts that do not show homology to known sequences stored in public databases might represent undiscovered short protein-coding sequences that could be misannotated as lncRNAs.

Although identifying conserved lncRNA sequences has proven challenging, studies of plant and animal transcriptomes have suggested better sequence conservation at lncRNA promotor sites of vertebrates than the sequence conservation at lncRNA transcripts, particular gene structures and locations around protein-coding genes (Kutter et al., 2012; Johnsson et al., 2014; Nitsche et al., 2015; Deng et al., 2018; Singh et al., 2018), as well as at the structural and functional levels (Kashi et al., 2016). Such positional information and gene structure characteristics such as splice sites will reveal lncRNA genes in other organisms and guide researchers toward more accurate lncRNA identification; however, this approach requires highquality reference genomes and transcriptomes. Moreover, the features of lncRNAs when folded into secondary and tertiary structures and the relationship between conformation and function suggest another promising opportunity for better lncRNA prediction. However, in silico RNA folding algorithms are usually more inaccurate as the transcript length increases (Mathews and Turner, 2006). Even though the relationship between structure and function has been examined for few lncRNAs, studies evaluating the complete folding process of lncRNAs have identified domains that might be important for functional interactions and have compared the folding characteristics of lncRNA with other RNA species computationally (Yang and Zhang, 2014; Liu et al., 2017). Considering that lncRNA secondary and tertiary structures might be important for their interactions and cellular activities (Johnsson et al., 2014) and considering that even lncRNAs that are not conserved can still adopt the same secondary structures (Diederichs, 2014), gaining more information about lncRNA folding might contribute to lncRNA identification by facilitating searches for evolutionary conservation in secondary and tertiary structures instead of in the primary sequences.

The low expression of lncRNAs and expression profiles that are tissue- or developmental stage-specific have further hindered their discovery (Tsoi et al., 2015). Expression profiles of lncRNAs might also provide clues for the prediction of new lncRNAs. However, transcripts with low abundance are usually harder to capture with conventional RNA sequencing applications (Clark et al., 2015; Kashi et al., 2016). Tissue- or developmental stage-specific lncRNAs are also difficult to detect. The time or conditions of sample collection can directly affect which lncRNAs appear in the sequencing results and exclude others expressed at different stages, in different tissues, or under different conditions.

### FUTURE PERSPECTIVES AND CONCLUSION

De novo assembled partial transcripts used to cause trouble in the identification of any molecule, leading to false annotations or underestimation of transcriptomes. Especially in the case of lncRNAs, these erroneous annotations become very hard to distinguish due to the fact that lncRNAs lack sequence conservation. For that reason, obtaining a wellassembled transcriptome data and having chance to locate the annotated lncRNAs will greatly advance the lncRNA identification procedures.

Now that we have the high-quality reference genomes of wheat and barley, it is now time to use them as efficient as possible. To do that, both breeders, biologists and bioinformaticians should undertake responsibilities and work for better tools and

methods. Drawbacks that has been encountered in currentlyused lncRNA identification strategies should be overcome for a better understanding of mechanisms lying behind important traits to be used for developing more resistant and more yielding cultivars. Despite the fact that it is challenging, machine learning approaches give promising outcomes in terms of the identification of a group of non-conserved molecules, lncRNAs. Further development of these approaches may lead us to discover other features of lncRNAs that are conserved, such as location, folding characteristics or function. For instance, development of better algorithms that assess folding of lncRNA transcripts would provide clues on their interaction interfaces and thus, on their interacting partners. Similarly, gaining more idea about the

### REFERENCES


interacting partners of a lncRNA would direct us to its function in molecular pathways. Altogether, even though we still have a long way to go until perfectness in lncRNA identification, wheat and barley reference sequences provides a more precise perspective. Better understanding the world of lncRNAs by the help of reference sequences would lead us to the development better cultivars to feed the planet.

### AUTHOR CONTRIBUTIONS

HB conceived and designed the study. HB, SK, and HC wrote the article.




**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Budak, Kaya and Cagirici. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.